Mapping and screening of the tomato Stemphylium lycopersici resistance gene, Sm, based on bulked segregant analysis in combination with genome resequencing

Background Tomato gray leaf spot disease caused by Stemphylium lycopersici (S. lycopersici) is considered one of the major diseases of cultivated tomatoes. The only S. lycopersici resistance gene, Sm, was derived from the wild tomato species S. pimpinellifolium. Sm has been identified as an effective source of gray leaf spot resistance in tomatoes and has been mapped to tomato chromosome 11. In this study, the first bulked segregant analysis (BSA) combined with genome resequencing for the mapping and screening of the Sm candidate gene was performed. Results Based on the resequencing results, we identified 50,968 Diff-markers, most of which were distributed on chromosome 11. A total of 37 genes were located in the interval of 0.26-Mb. The gene loci of resistant and susceptible lines were sequenced successfully using PCR products. The relative expression levels of candidate genes in resistant and susceptible lines were confirmed via qRT-PCR, Solyc11g011870.1.1 and Solyc11g011880.1.1 were identified through qRT-PCR. A marker, D5, which was cosegregated with the resistant locus, was identified according to the mutation of the Solyc11g011880.1.1 trait in the resistant line. Conclusions The Sm gene was mapped to the short arm of chromosome 11. The candidate genes Solyc11g011870.1.1 and Solyc11g011880.1.1 displayed expression patterns related to the resistance response. This study will be valuable for Sm cloning and Sm gene breeding in tomato. Electronic supplementary material The online version of this article (10.1186/s12870-017-1215-z) contains supplementary material, which is available to authorized users.


Background
Gray leaf spot disease is considered a common, devastating, and damaging disease of plants such as pepper [1], cotton [2], spinach [3] and eggplant [4]. In tomato, it is caused by four species of Stemphylium: Stemphylium solani, Stemphylium floridanum and Stemphylium lycopersici [5]. It is considered a major disease of cultivated tomatoes and has threatened tomato-growing areas worldwide [6]. In the early stages, tomato gray leaf spot disease symptoms appear as brownish-black specks that later expand into necrotic lesions with gray centers and dark brown borders. As the disease progresses, the affected leaves become chlorotic and the lesions develop perforated centers, ultimately causing the leaves to become dry and fall off. S. lycopersici has been shown to cause tomato gray leaf spot disease based on morphology and molecular identification [7]. However, it is increasingly difficult to control this disease in tomatogrowing regions around the world. Therefore, breeding for gray leaf spot resistance will provide an attractive alternative to chemical control.
To date, only the resistance gene Sm has been identified as a dominant gene, which was mapped to chromosome 11 in tomatoes [8]. Sm was derived from the wild tomato species S. pimpinellifolium, which has been used to breed resistant tomato cultivars [9]. The Sm gene is considered an effective source of gray leaf spot resistance in tomatoes. To our knowledge, few studies have been conducted on the tomato gray leaf spot resistance gene.
Bulked segregant analysis (BSA) was first proposed by Michelmore et al. [10] and was considered an effective method for identifying markers linked to a target gene [11][12][13] based on the genotyping of only two bulked DNA samples from groups of individuals with distinct resistant and susceptible phenotypes [14]. Genome resequencing technology, which is based on highthroughput sequencing, is a recently developed highresolution strategy for single nucleotide polymorphism (SNP), insertion-deletion (InDel) and structural variation (SV) marker discovery and genotyping [15].
Notably, this is the first time that bulked segregant analysis combined with genome resequencing technology has been used for the mapping and screening of candidate genes for the tomato Stemphylium resistance gene Sm. This study will be valuable for Sm cloning and resistance breeding in tomato.

Plant materials and S. lycopersici inoculation
Resistant female Motelle (P 1 , kindly provided by the Chinese Academy of Agricultural Sciences), comprising the Sm gene, was crossed with the susceptible male Moneymaker (P 2 , kindly provided by the Chinese Academy of Agricultural Sciences). The resulting F 1 plants were self-crossed to produce F 2 seeds, and the BC 1 plants were obtained from a backcross between Motelle and F 1. lines. These seedlings were bred in a greenhouse under favorable conditions. S. lycopersici was plated on potato dextrose agar (PDA) in Petri dishes. The isolated pathogen was incubated at 28°C for 5-10 days with a 12-h photoperiod. The tomato seedlings were sprayed with a conidial suspension (1 × 10 4 conidia/ml). The mock-treated plants were sprayed with sterilized water. All seedlings were grown in the greenhouse at 28°C with a relative humidity >85% [14]. After 3-5 days, symptoms of gray leaf spot disease appeared on the tomatoes.
At the 3-4 leaf stage, all P 1 , P 2 , F 1 , F 2 , and BC 1 plants were inoculated with S. lycopersici. The disease index was evaluated at 7 days post inoculation [16]. The plants were visually assessed for the severity of symptoms on a scale of 1-5 points: 1 point, no symptoms; 2 points, rare lesions; 3 points, few lesions; 4 points, numerous lesions; and 5 points, coalescence of lesions. Plants with a disease index of 0-1 were regarded as resistant, and those with a disease index ≥2 were regarded as susceptible.
After inoculation, the F 2 plants were used for genetic analysis and bulked segregate analysis (BSA). The parents (P 1 and P 2 ) and the F 2 lines were prepared for genome resequencing and the detection of molecular markers, respectively. The resistant pool (F 2 R-pool, 25 resistant plants) and the susceptible pool (F 2 S-pool, 25 susceptible plants) were built by screening resistant and susceptible plants from the F 2 populations [17]. The CTAB method was used for DNA extraction from young leaves, including the parents and the F 2 lines. Bulked DNA samples were also subjected to the CTAB method by mixing equal amounts of DNA at a final concentration of 200 mg [18].

Resequencing and association analysis
The genomic DNA of each individual plant was extracted and then fragmented randomly. Adapter ligation and DNA cluster preparation were then performed, followed by Hiseq2000 sequencing. SOAP2 software (http://soap.genomics.org.cn/soapaligner.html) was used for sequencing reads, which were mapped to the reference genome sequence [19]. The sequencing depth and coverage compared with the reference genome were calculated based on the alignments. SNPs and InDels in the sequenced genome were detected using SOAP snp (http://soap.genomics.org.cn/soapsnp.html) and SOAP indel, respectively. SNP annotation (including synonymous and non-synonymous SNP mutations) and InDel annotation were performed according to the tomato reference genome (estimated size of 0.9 Gb) [20].
In this study, an average of at least 20× were generated from each sample to detect all SNPs and lx for each progeny (25 for each pool) [15]. Raw sequence reads (150 bp in length) were filtered and trimmed for quality control and adaptor removal [15]. A library was constructed for each sample, and all clean Q20 (%) values for the four libraries (P 1 , P 2 , F 2 R-pool and F 2 S-pool) were greater than 96%, indicating that the data quality was very high. Primary sequencing data were cleaned by removing reads with adapters, and a low-quality read was defined based on the number of low-quality reads (<20). Low-quality reads were removed, and the remaining high-quality data were used for mapping [21]. An SNP was defined according to its presence in the two parent lines and markers with a depth no less than 5 and a base quality >20. A homozygous and variant SNP marker, which originates from two parent lines, was used for further analysis [22,23].
The resistant phenotype is dominant, and the ratios of resistant plants should be consistent with those of the susceptible plants for the unrelated markers [24]. Therefore, only the genotype of the P 2 plants could be found in terms of related markers. The SNP index is a method of marker association analysis, which is used to judge differences in genotype frequencies between pooled samples [25]. The SNP indexes of the resistant and susceptible pools were 0.25 and 0.75, respectively, and those SNPs were used to analyze the associated regions of candidate genes [24]. The resequencing and association analysis were performed by BGI Tech (Shenzhen, China).

Parental genome sequencing and candidate gene screening
The alignment results were used to calculate the average sequencing depth and coverage [26,27]. All SNP and InDel polymorphisms were detected using the mapped reads. In this part of the study, DNA from the parents was used for sequencing, read mapping, and analysis of SNPs and InDels [28]. The Heinz 1706 tomato [20] was used as the reference genome for read mapping. All the SNPs in the association region were used for BSA combined with genome resequencing. Genes in the association region were annotated based on the NCBI and SGN (http://solgenomics.net/) websites. SNPs that were consistent with the susceptible genotype in the F 2 S-pool were used to narrow the candidate region.

Quantitative real-time PCR analysis of candidate genes
Candidate gene expression analysis was performed using qRT-PCR. P 1 and P 2 were inoculated at the 3-4 true leaf stage. Young leaves were collected at 0, 3, 5 and 8 days after inoculation. Total RNA was extracted using the TRIzol reagent method, with three biological repeats. Reverse transcription was performed using the TaKaRa M-MLV Reverse Transcriptase (RNase H-) reverse transcription kit according to the operating instructions.
The qRT-PCR mixture contained 10 μL of 2xTrans Start Top Green q PCR Super Mix (Trans Gen, China), 1 μL of each primer, 2 μL of the cDNA template (1:5 dilution) and the appropriate volume of sterile distilled water for a total volume of 20 μL. The thermal conditions were as follows: 95°C for 10 min, followed by 40 cycles of 95°C for 5 s, 59°C for 15 s, and 72°C for 30 s. To detect primer dimerization or other artifacts of amplification, a melting-curve analysis was performed immediately after the completion of qRT-PCR (95°C for 15 s, 55°C 15 s, then slowly increasing the temperature by 0.5°C per cycle to 95°C with continuous measurements of fluorescence). The data analysis was performed using the 2 -ΔΔCT method (Livak and Schmittgen, 2001) with EFa1 (R: 5'-CCACCAATCTTGTACACATCC-3' S: 5'-AGACCACCAAGTACTACTGCAC-3′) as a reference gene for normalization.
Sequencing of candidate gene loci and DNA sequence analysis Primer 5.0 software was used to produce the sequences of candidate genes, and the primers of candidate genes are shown in Table 4. Reference genome sequences of candidate genes originated from the SGN.
A PCR purification kit (Takara) was used to purify parental PCR products. These purified products were then cloned into a pMD18-T vector (Takara) for sequencing. All fragments that came from sequencing were uploaded to GenBank.

Marker development and linkage analysis
To develop more polymorphic markers between Motelle and Moneymaker based on sequence variations in or near the resistant gene trait, the sequences of candidate genes were obtained from the SGN, and Primer 5.0 software was used to develop the marker primers. After PCR and sequencing, eight markers, including CAPS markers and SCAR markers, were developed to screen an F2 population of 519 plants for linkage analysis.

Identification of the pathogen and genetic analysis of the Sm gene
In this study, genomic DNA of the pathogen was extracted, and the internal transcribed spacer (ITS) regions were amplified and sequenced with the primers ITS1 and ITS4. All sequences were submitted to GenBank (accession nos. KX858848, KX858849). BLAST search results revealed that all sequences exhibited 99% identity with S. lycopersici [29]. Figure 1a shows the symptoms of gray leaf spot disease on tomato leaves.  Finally, we confirmed that gray leaf spot disease was caused by S. lycopersici based on morphological characteristics and molecular identification. Motelle and F 1 plants were resistant to S. lycopersici, while Moneymaker plants were susceptible. The segregation ratio between resistant and susceptible plants in the F 2 population was 3:1. The segregation ratio between resistant and susceptible plants in BC 1 plants was 1:1 (Table 1).

Parental genome sequencing and candidate gene screening
In this study, a total of 67 Gb of data, including 447 M reads, and 89 Gb of data, including 593 M reads, were obtained through parental genome resequencing and F 2 bulked segregant analysis, respectively ( Table 2).
Based on the resequencing results, 50,968 Diff-markers in parent lines were obtained, and 46,941 of these markers were distributed on chromosome 11. A distribution diagram of polymorphic markers (green lines) on 12 chromosomes was drawn according to the results of the resequencing positioning on the genome (Fig. 2). A region was considered an association region based on three or more consecutive Diff-markers. We found that 37 genes were distributed in the association region on chromosome 11 ( Table 3). SNPs that were consistent with the susceptible genotype in the F 2 S-pool were used to narrow the candidate region.
To further narrow the candidate region, SNPs in the association region were screened by resequencing combined with the F 2 BSA data and were analyzed. All SNPs, including 8 with an SNP index close to 0.33, were located in the interval of 0.26-Mb region (Table 4) (Additional file 1: Table S1). Based on the results of the association analysis and gene function annotation, six genes were screened out: Solyc11g011520.  (Fig. 3).

Quantitative real-time PCR analysis
The relative expression levels of candidate genes in Motelle and Moneymaker were confirmed using qRT-PCR. The results showed that Solyc11g011870.1.1 and Solyc11g011880.1.1 presented expression patterns related to the resistance response (Fig. 4). The primer sequences for all candidate genes are reported in Table 4. These two candidate genes were expressed at a low level before inoculation, which increased slightly after inoculation. Their expression levels then increased rapidly 5 days after inoculation and continued to increase during the following days. In particular, compared with 0 days, the expression levels of the two resistancerelated genes increased at 5 and 8 days after inoculation. Nevertheless, the expression levels of the other four genes were incompatible with a relationship with resistance. In conclusion, the qRT-PCR results indicated that the expression levels of Solyc11g011870.1.1 and Solyc11g011880.1.1 were compatible with a relationship with resistance. The other four genes all showed expression patterns unrelated to the resistance response.
In a previous study, we performed a transcriptome analysis of the Sm-mediated resistance response to S. lycopersici in tomatoes. Our RNA-Seq results showed that Solyc11g011870.

Candidate gene sequencing and sequence analysis
The Solyc11g011870.1.1 and Solyc11g011880.1.1 loci of Motelle and Moneymaker were successfully sequenced using the PCR products (GenBank: MF059094, MF059095, MF059096, MF059097). All other association region gene loci of Motelle and Moneymaker were also sequenced successfully using the PCR products. Sequence alignment showed that the DNA sequences of thirty-two gene loci contained SNPs between Motelle and Moneymaker. There were several SNPs located in coding regions at some candidate

PCR validation and marker development
The SCAR marker D5 (forward primer: 5′-CCCGTGGCACTACAACTCTT-3′; reverse primer: 5′-TCTGCTTTCGCTCTGCTTGA-3′) was cosegregated with the resistance locus. The D5 was designed according to a 56-bp insertion in the resistance trait of Motelle. An 876-bp sequence was amplified from resistant plants, while an 820-bp sequence was amplified from susceptible plants, and the two sequences were amplified in the F 1 plants (Fig. 6)

Discussion
The Sm gene maps to chromosome 11 The Sm gene was previously assigned to the long arm of chromosome 11 between TG110 and T10 in 1991. In this study, the Sm gene was mapped to the short arm of chromosome 11, and one association region of the Sm gene was identified based on the results of F 2 bulked segregant analysis in combination with genome resequencing technology. Although some Diff-markers were found on the long arm of chromosome 11, they did not form association regions in the association analysis. To the best of our knowledge, few studies of this gene have been performed in recent years.
The 56-bp insertion of Solyc11g011880.1.1 in Motelle in the coding region changes the conserved domain Pathogens harbor toxic and avirulent (AVR) genes, and host plants exhibit resistance (R) and susceptible (S) genes. A plant can develop induced resistance only when a pathogen carrying an avirulent gene infects a host plant carrying a corresponding R gene; otherwise, the plant will be infected [30]. However, it is also theoretically suggested that there is a potential for resistance in plants carrying either R or S genes. The specialization between plants and pathogens is based on the level of receptor recognition. Therefore, all plant R genes encode receptor proteins. Thus, whether induced disease resistance occurs depends on the properties of mutual plantfor-pathogen and gene-for-gene recognition. Plant disease resistance is generally defined by the gene-for-gene hypothesis [31], which states that when an avirulent (AVR) gene product of a pathogen is specifically recognized, an R gene encoding a receptor protein induces the process of plant resistance. Stress-antifung proteins are considered effective in resistance to fungal diseases. Ginkbilobin-2 (Gnk2) is an antifungal protein, which was identified in the endosperm of Ginkgo seeds, that was found to play a vital role in the development of phytopathogenic fungi (e. g., Fusarium oxysporum). Previous studies indicated that Gnk2 was very similar to the extracellular domain of cysteine-rich receptor-like kinases (CRK) in Arabidopsis. These findings also demonstrated that CRKs could be induced by pathogen infection, a series of responses to reactive oxygen species or salicylic acid as a component  In conclusion, according to the RNA-Seq results and the bulked segregant analysis performed in combination with genome resequencing technology, the candidate gene Solyc11g011880.1.1 in the Motelle line may be our target gene, Sm. As two candidate genes, Solyc11g011870.1.1 and Solyc11g011880.1.1 were found in this target region, and the function of these two genes will be verified through virus-induced gene silencing (VIGS). Functional verification of the candidate genes in the resistant tomato line Motelle is currently ongoing in our laboratory.
The marker D5 can be used in marker-assisted selection (MAS) breeding The marker D5 was cosegregated with the resistance locus. Interestingly, an 876-bp sequence was amplified in resistant plants, while an 820-bp sequence was amplified in susceptible plants, and the two sequences were amplified in the F 1 plants. This marker was tested in F 2 individuals. However, five F 2 individuals showed an inconsistent genotype. Considering the possible explanations for this finding, we assumed that the resistance was controlled by an incomplete dominant gene.
Although several plants were inconsistent with D5 tests, verification of the D5 marker during genotype identification is required for tomato MAS breeding. The results of the research will provide a basis for future MAS breeding and studies on the mechanism of tomato gray leaf spot disease resistance.

Conclusions
In this study, the F 2 bulked segregant analysis combined with genome resequencing were used to locate the Sm  gene. A total of 50,968 Diff-markers were obtained, most of which were distributed on chromosome 11. A 0.26-Mb region with 37 genes was obtained the resequencing results. The candidate genes Solyc11g011870.1.1 and Solyc11g011880.1.1 were identified through qRT-PCR, which showed related expression patterns. This study will provide a basis for Sm cloning and application of the Sm gene in breeding.

Additional file
Additional file 1: Table S1.