A high-density collection of EMS-induced mutations for TILLING in Landsberg erecta genetic background of Arabidopsis

Background Arabidopsis thaliana is the main model species for plant molecular genetics studies and world-wide efforts are devoted to identify the function of all its genes. To this end, reverse genetics by TILLING (Targeting Induced Local Lesions IN Genomes) in a permanent collection of chemically induced mutants is providing a unique resource in Columbia genetic background. In this work, we aim to extend TILLING resources available in A. thaliana by developing a new population of ethyl methanesulphonate (EMS) induced mutants in the second commonest reference strain. In addition, we pursue to saturate the number of EMS induced mutations that can be tolerated by viable and fertile plants. Results By mutagenizing with different EMS concentrations we have developed a permanent collection of 3712 M2/M3 independent mutant lines in the reference strain Landsberg erecta (Ler) of A. thaliana. This population has been named as the Arabidopsis TILLer collection. The frequency of mutations per line was maximized by using M1 plants with low but sufficient seed fertility. Application of TILLING to search for mutants in 14 genes identified 21 to 46 mutations per gene, which correspond to a total of 450 mutations. Missense mutations were found for all genes while truncations were selected for all except one. We estimated that, on average, these lines carry one mutation every 89 kb, Ler population providing a total of more than five million induced mutations. It is estimated that TILLer collection shows a two to three fold higher EMS mutation density per individual than previously reported A. thaliana population. Conclusions Analysis of TILLer collection demonstrates its usefulness for large scale TILLING reverse genetics in another reference genetic background of A. thaliana. Comparisons with TILLING populations in other organisms indicate that this new A. thaliana collection carries the highest chemically induced mutation density per individual known in diploid species.


Background
A major challenge in plant biology is the identification of biological functions for all genes from the main model plant species, Arabidopsis thaliana and rice. To this end, a large number of genetics and genomics resources are being developed in both model plants [1,2]. In particular, collections of induced mutants that can be screened by reverse genetics, such as T-DNA or transposon insertional mutants [3][4][5] provide a unique resource for functional studies. However, the mutational spectrum of insertional mutagenesis with effect on gene function is mostly limited to gene knock-out disruptions. Genes whose severe lossof-function is lethal or highly pleiotropic cannot be functionally dissected with such mutants. In addition, the size of saturated populations containing insertion mutants randomly generated for most genes of an organism is extremely high because each line carries only a rather small number of mutations [6]. As a complementary resource, chemically induced mutants have been shown to provide an efficient alternative because each individual line can bear single point missense and nonsense substitutions in hundreds of genes [7]. Therefore, an allelic series of induced mutations with different effects on gene function can be easily isolated by screening a few thousands mutagenized plants [6].
In the past few years, chemically induced mutants have become a major resource for reverse genetics studies thanks to the development of TILLING (Targeting Induced Local Lesions IN Genomes) [8]. TILLING enables the reverse selection of single point mutations by cleavage of mismatches in heteroduplex DNA with the endonuclease CEL I. This powerful strategy was first applied in an A. thaliana mutant collection induced with ethyl methanesulphonate (EMS) [9,10] in the commonest genetic background Columbia (Col) whose genome sequence had been first completed [11]. Since then, TILLING collections of EMS induced mutants have been developed in a large number of plant species including rice, maize, barley, sorghum, wheat, Brassica napus, B. oleracea and Medicago truncatula, as well as model animals like Drosophila and Caenorhabditis elegans [12][13][14][15][16][17][18][19][20][21]. In most of these EMS mutant collections, reference genetic backgrounds of wide and general interest are used. However, given the limitations of having mutations in a single genetic background, new populations of chemically induced mutants for TILL-ING analyses are currently being developed in other reference strains of several species like rice or soybean [22,23]. In addition, the quality of TILLING mutant populations is determined by the density of mutations per individual, since this limits the size of allelic series than can be isolated for each gene and the size of a saturated genome population. For this reason, other TILLING populations have been developed in rice, barley, soybean or M. truncatula, aiming to increase the amount of mutations per line by either using different mutagens like sodium azide and N-methyl-N-nitrosourea or increasing the mutagen dose [12,[22][23][24][25].
In A. thaliana, several reference genetic backgrounds are widely used such as Col or Landsberg erecta (Ler). The latter is the second most commonly studied strain because many mutants have been classically isolated in it and a large portion of its genome sequence was available soon after Col sequence [26]. In this work we have developed a new collection of A. thaliana EMS induced mutants for TILLING reverse genetics, aiming at two major objectives. First, to extend TILLING resources in A. thaliana by using Ler reference genetic background, for which reverse genetic tools are rather limited. Second, to enrich the number of independent mutations available in this collection as much as possible by increasing the density of mutations per line. TILLING evaluation of this population for several gene fragments indicates that it carries the largest density of chemically induced mutations reported in diploid organisms, hence demonstrating its usefulness for reverse selection of mutants.

Generation of a permanent collection of highly EMSmutagenized lines in Arabidopsis
To obtain a new population of chemically induced mutant lines useful for reverse genetic studies in Arabidopsis thaliana, seeds of the Landsberg erecta (Ler) glabrous1-1 genotype were mutagenized with EMS at concentrations of 20 to 50 mM ( Figure 1). The effects of EMS and the efficiency of the mutagenesis treatment were estimated by quantifying three parameters on M 1 plants: seed germination, frequency of albino chimeras and fertility (see Methods). Germination of M 1 seeds was negatively correlated with EMS dose (r = -0.93; p = 0.008), while the frequency of M 1 albino chimeras increased with concentration (r = 0.99; p = 0.001) ( Figure 1A and 1B). Seed fertility of M 1 plants and the degree of M 2 embryo lethality was quantified by estimating the proportion of fully or nearly sterile fruits (classes As and Aa) and the proportion of semi or normal fertile fruits (classes B and C). As shown in Figure  1C, the total frequency of class A fruits increased linearly with EMS concentration, whereas the frequency of fertile fruits rapidly decreased. To maximize the frequency of mutations per individual, only M 1 plants from treatments showing a frequency of fertile fruits smaller than 35% but larger than 2% were individually harvested. A total of 3712 M 2 families were grown to isolate individual M 2 DNA and to harvest their M 3 offspring seeds. To ensure independence of the mutations present in this population, a single M 2 plant was harvested from each M 1 plant. In agreement with the high proportion of embryo lethality, all M 2 families segregated for easily visible morphological mutations (data not shown). Fifty six percent of M 2 lines were derived from 25 mM EMS mutagenesis, and on average, EMS treatments used to obtain the collection show less than 25% fertile fruits ( Table 1). The DNA of M 2 plants and the M 3 seeds of the 3712 lines were stored (see Methods) providing a permanent population of mutant lines for TILLING analysis in Ler genetic background. This population has been named as the Arabidopsis TILLer collection.

Mutation frequency, distribution and functional spectrum in TILLer collection
The quality of this collection for mutant discovery was evaluated after the analysis of 14 gene fragments distributed among four of the five A. thaliana chromosomes and chosen from requests by different research laboratories (see Methods; Figure 2 and Table 2). These fragments have a GC content similar to that of fragments studied in Col background TILLING collection [10] and to average coding regions of A. thaliana genome [11]. On average, amplicons were 1.1 kb long and 62.2% corresponded to exon sequence, which is similar to genome average exon proportion [11]. In total, we found and confirmed by sequencing 450 mutations in the 15.7 kb analyzed from different amplicons. All mutations corresponded to G/C to A/T transitions, in agreement with the nearly unique type of nucleotide substitutions observed in previous analyses of EMS-mutagenized A. thaliana plants [10]. The distribution of mutations among the EMS doses of the lines was independent of the number of lines per dose (Table 1; χ 2 = 28.9; df = 4; p < 0.0001). A larger number of mutations were found at 30, 35 and 40 mM than expected from the number of lines, while the opposite was found at 25 mM. In addition, 34 lines carried several mutations in the same or two different gene fragments (Table 1). An excess of these lines was found in plants derived from high EMS concentrations (≥30 mM) while a deficiency appears in low EMS concentrations (≤25 mM) when comparing with the expected number according to the proportion of lines from each EMS dose (χ 2 = 5.8; df = 1; p = 0.016). Therefore, the higher the EMS concentration used to obtain the lines, the larger the number of mutations per line.
On average we analyzed 2972 lines per fragment and detected 10.8 mutations per 1000 mutant lines ( Table 2). Twenty-one to 46 mutations were found per fragment, and in most gene fragments there was a reduction of mutation detection in the ~100 bp terminal segments ( Figure 2), as expected from LI-COR detection system (see Methods). However, mutations appeared evenly distributed along the rest of the gene fragments within exon and intron regions ( Figure 2).
For all but one gene fragment we found mutations of three classes according to their predicted effects on protein structure: silent, missense and truncation mutations ( Table 2). The observed frequencies of the three classes of mutations fitted the expected frequencies of silent, missense and truncations, respectively, as estimated by COD-DLE analyses (χ 2 = 1.7; df = 2; p = 0.42). Truncations include nonsense mutations generating premature stop codons and mutations in intron splice sites, the observed frequencies of both classes (2.5% and 1.8% respectively) also fitting expected frequencies (4.0% and 1.1%)(χ 2 = 2.8; df = 1; p = 0.09). Interestingly, truncation mutations were obtained for 13 of the 14 fragments, as expected from their 5.1% frequency and the large average number of mutations found per gene (1-[1-0.05] 32 = 0.81 probability).
As shown in Table 2, an average ratio of heterozygous/ homozygous mutations of 3.7 was found, which is significantly different from the expected 2:1 proportion for M 2 plants (χ 2 = 30.2; df = 1; p < 0.0001). Although an excess of heterozygotes appeared for silent mutations (p < 0.01), this ratio was extreme for truncations since all but one of such mutations were present as heterozygotes. In addition, distortion from the expected proportion was larger for high EMS dose lines (35-40 mM) than for low concentrations (25-30 mM) ( Table 1).

Discussion
We have developed a new permanent collection of 3712 independent EMS-induced mutant lines for reverse genetic analysis in the reference laboratory strain Landsberg erecta of A. thaliana. To maximize the number of mutations present in this population we have increased the frequency of mutations per M 2 /M 3 line by using M 1 plants with lower seed fertility than that of plants used to obtain the existing population in Columbia background [9]. By compromising fertility, we aimed to saturate the number of chemically induced mutations that can be tolerated by A. thaliana plants that are still viable and able of sexual reproduction. We estimated that, on average, the lines of this new Ler collection carry one mutation every 89 kb, which is significantly larger than the density of 1/ 300 kb estimated in current Col population [10]. As expected, we found that the higher the EMS concentration the higher the density of detected mutations per line. Thus, experimental control of EMS mutagenesis enables substantial increase of the frequency of induced mutations in viable and seed fertile plants. However, we cannot discard that mutation density differences between both TILLING populations of A. thaliana might be partly due to natural genetic variation between both wild type strains for their tolerance to chemically induced mutations. Accordingly, it could be speculated that such natural variation might be determined by variation for reproductive system plasticity or for DNA repair mechanisms.
As described by Greene et al. [10] estimations of the density of chemically induced mutations detected by TILLING procedure can be biased due to several factors such as: 1) uneven mutation detection among the pools of eight plants; 2) uneven mutation detection along the length of gene fragments; 3) higher GC content of analyzed fragments (41%) than average genome (35%) [11]; and 4)  Table 3). Only rapeseed and wheat collections carry a higher density of mutations, as expected from their polyploid nature and consequently, their higher tolerance to loss-of-function mutations due to gene duplications and redundancies (Table 3). Thus, the density of mutations found in Ler A. thaliana population increases the estimated load of chemically induced mutations that diploid species can tolerate in sexually fertile individuals. The two A. thaliana TILLING populations, Ler and Col, also differ in the proportion of heterozygous:homozygous mutations recovered in TILLING analyses, Ler showing substantially higher total average ratio than Col (3.7 versus 2.1, respectively) [10]. The largest deficiency of homozygous mutations corresponds to truncations, which shows the largest difference between both populations (ratio of 3.7 vs. 19 for Col and Ler, respectively). Therefore, a stronger negative selection against deleterious mutations seems to affect Ler than Col collection. This is probably a consequence of the extreme high-density of mutations present in Ler lines, since the maximum load of deleterious induced mutations that can be tolerated by a viable and fertile M 2 plant will likely be determined by a threshold number of homozygous truncations and deleterious missense mutations. M 2 plants carrying a higher number of homozygous deleterious mutations than this threshold will not be viable or fertile. Given the self-fertilizing nature of A. thaliana, the higher the M 1 mutation density, the higher the proportion of M 2 offspring plants that will surpass the maximum number of homozygous deleterious mutations and, consequently, stronger selection against such mutations. Thus, higher M 1 mutation densities will lead to higher M 2 ratios of heterozygous/ homozygous mutations due to lower frequency of M 2 plants below the threshold of homozygous deleterious mutations. This relationship is supported by the larger ratios observed in Ler lines with high mutation density generated with EMS doses ≥35 mM, than in lines with lower density obtained with 25-30 mM EMS. Nevertheless, presumed silent mutations including synonymous and intronic mutations also showed a significant defect of homozygotes in Ler collection, whereas this was not observed for missense mutations. Potential genetic mechanisms accounting for this unexpected result are unknown but it cannot be discarded that the genes surveyed in this work are biased for the deleterious effect of their mutations. In agreement, other A. thaliana public mutant collections do not contain mutations in several of the genes analyzed here http://www.arabidopsis.org suggesting that mutations in their coding and non-coding regions show stronger deleterious defects than genome average.

Conclusions
The TILLer collection generated in this work provides a new resource for reverse selection of EMS induced mutants in A. thaliana. The high mutation density of this population increases the size of allelic series that can be obtained and reduces the population size that needs to be screened. However, this high mutation density implies that more backcrosses are required to eliminate undesired background mutations in selected mutant lines. It has been estimated that 20 mutations are necessary to have 0.95 probability of finding a missense deleterious mutation per gene [6]. Considering the ~50% observed fre-Gene fragments analyzed and mutations found in TILLer col-lection Figure 2 Gene fragments analyzed and mutations found in TILLer collection. Positions of silent, missense, splicing site and nonsense mutations are indicated by grey arrow heads, black arrow heads, grey asterisks and black asterisks, respectively.
quency of missense mutations and assuming that 25% of them are deleterious, we have calculated that on average, 1774 TILLer lines are sufficient to obtain 20 mutations per 1 kb gene fragment, while the larger analyses carried out until now are providing additional truncation mutations for ~90% of the genes. Currently, TILLer collection is screened as a public service to search for mutants in genes of interest for any laboratory http://www.cnb.csic.es/ tiller/. The availability of another TILLING service in the second commonest reference genetic background of A. thaliana enables deeper gene functional studies such as those aiming to uncover new gene effects or interactions of specific mutations with genetic backgrounds. Given the success of current existing collections, it can be expected that the use of chemically induced genetic variation will further extend in the near future with the development of similar resources in other reference strains and/or using other mutagens.

Mutagenesis
Seeds of the laboratory strain Landsberg erecta carrying the marker mutation glabrous1-1 were mutageniced with ethyl methanesulphonate (EMS) [9]. Fresh M 0 seeds were treated with 20, 25, 30, 35, 40 or 50 mM EMS during 17 hours in 10 ml vials containing 2500 seeds. Three to eight batches of 2500 seeds (vials) were treated at each dose. After thorough washing, M 1 seeds were sown on pots with soil:vermiculite mix at 3:1 proportion, in a 20°C greenhouse supplemented with lamps to provide a 16 hours light:8 hours darkness photoperiod. To estimate the EMS effects and the quality of the mutagenesis we quantified germination of M 1 seeds, and the proportion of chimeric M 1 plants that show albino or yellow sectors at the vegetative stage of six-eight leaves (albino chimeras). In addition, seed fertility and degree of embryo lethality of the M 1 plants were estimated as previously described [9] with the following modifications. For each EMS dose, 10 mature siliques of the main inflorescence from 10 M 1 plants were dissected under a stereomicroscope and the number of normal and aborted seeds was counted. From these data, fruits were classified in four classes according to their proportion of normal M 2 seeds and M 2 defective embryos. Class As is completely sterile and has no seed, either normal or aborted; class Aa has a 3:1 proportion of normal:aborted seeds, or smaller (aborted seeds >20%); class B shows 4:1 to 13:1 proportions (20% ≥ aborted seeds >6.7%); and class C has 14:1 or larger proportion (nearly normal fertile fruits with less than 6.7% aborted seeds). M 1 plants were individually harvested and treatments with a frequency of fertile fruits (B+C) larger than 2% or smaller than 35% were used to generate the M 2 /M 3 lines that are part of TILLer collection (Table 1). Mutageniced batches with more than 35% or less than 2% fertile fruits were discarded independently of the concentration of their EMS dose.

Development of mutant and DNA collections
Five to sixteen M 2 offspring seedlings were grown from each M 1 individual and tissue was collected from a single M 2 fertile plant. M 3 seeds of each M 2 selected plant were individually harvested and stored. To ensure enough tissue and M 3 offspring seeds from a single M 2 plant, each family was grown on a 0.9 l. pot that was maintained until