Simultaneous induction of mutant alleles of two allergenic genes in soybean by using site-directed mutagenesis

Background Soybean (Glycine max) is a major protein crop, because soybean protein has an amino acid score comparable to that of beef and egg white. However, many allergens have been identified among soybean proteins. A decrease in allergenic protein levels would be useful for expanding the market for soybean proteins and processed foods. Recently, the CRISPR/Cas9 system has been adopted as a powerful tool for the site-directed mutagenesis in higher plants. This system is expected to generate hypoallergenic soybean varieties. Results We used two guide RNAs (gRNAs) and Agrobacterium-mediated transformation for simultaneous site-directed mutagenesis of two genes encoding the major allergens Gly m Bd 28 K and Gly m Bd 30 K in two Japanese soybean varieties, Enrei and Kariyutaka. We obtained two independent T0 Enrei plants and nine T0 Kariyutaka plants. Cleaved amplified polymorphic sequence (CAPS) analysis revealed that mutations were induced in both targeted loci of both soybean varieties. Sequencing analysis showed that deletions were the predominant mutation type in the targeted loci. The Cas9-free plants carrying the mutant alleles of the targeted loci with the transgenes excluded by genetic segregation were obtained in the T2 and T3 generations. Variable mutational spectra were observed in the targeted loci even in T2 and T3 progenies of the same T0 plant. Induction of multiple mutant alleles resulted in six haplotypes in the Cas9-free mutants derived from one T0 plant. Immunoblot analysis revealed that no Gly m Bd 28 K or Gly m Bd 30 K protein accumulated in the seeds of the Cas9-free plants. Whole-genome sequencing confirmed that a Cas9-free mutant had also no the other foreign DNA from the binary vector. Our results demonstrate the applicability of the CRISPR/Cas9 system for the production of hypoallergenic soybean plants. Conclusions Simultaneous site-directed mutagenesis by the CRISPR/Cas9 system removed two major allergenic proteins from mature soybean seeds. This system enables rapid and efficient modification of seed components in soybean varieties. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-020-02708-6.


Background
Soybean (Glycine max, 2n = 2x = 40) is one of the most important protein crops used for food and forage worldwide, because its seeds contain high-quality proteins with an amino acid score comparable to that of beef and egg white [1]. Diverse soybean proteins are responsible for the physical properties of foods and other products made from soybean seeds [2,3]. In the USA and Europe, 5 to 8% of babies and 2% of adults are allergic to soybean [4]. Several subunits of major storage proteins such as 7S and 11S globulins and 2S albumin are representative soybean allergens [5]. The vicilin-like glycoprotein Gly m Bd 28 K and the oilbody-associated protein Gly m Bd 30 K are also reported as major soybean allergens [6,7]. Hydrophobic proteins Gly m 1A and Gly m 1B and the hull protein Gly m 2 are related to asthma outbreaks in Spain [8,9]. Profilin Gly m 3 and the pathogenesis-related protein Gly m 4 are crossreactive with antigens from other sources involved in sensitization and symptom induction [10,11]. Positive response to soybean protein in allergic reaction has been reported in 14% of patients diagnosed with food allergies with atopic dermatitis [12]. Therefore, development of hypoallergenic soybean varieties or establishment of a procedure to remove allergens would be useful for expanding the market of soybean proteins and processed foods.
Protein fractionation on the basis of the differences in protein solubility at different salt concentrations and pH can be used to characterize the biochemical and physical properties of proteins [13][14][15]. This technique is also used for the removal of specific allergens from soy foods. Gly m Bd 30 K was efficiently removed from soy milk by acidifying it to pH 4.5 with 1 M Na 2 SO 4 [16].
Genetic improvement of soybean is achieved by crossing plants carrying allergen-deficient alleles from soybean genetic resources or by mutagenesis to generate allergen-deficient mutant alleles. A number of spontaneous or induced mutants deficient in subunits of 7S or 11S globulins have been reported [17][18][19][20]. Among the germplasm of wild soybean (G. soja), Hajika et al. [20] found one accession lacking the α-, α'-, and β-subunits of 7S globulin. The deficiency of these subunits is controlled by a single dominant gene (Scg-1), which is closely associated with post-transcriptional gene silencing [21]. To develop hypoallergenic soybean through crossing and subsequent back-crossing, this dominant gene has been introduced into an elite variety, Fukuyutaka [22]. The soybean variety Yumeminori lacks αand α'-subunits of 7S globulin, and Gly m Bd 28 K, and has a decreased level of the β-subunit of 7S globulin; this variety has been developed through mutagenesis by gamma-ray irradiation [23]. Mutagenesis of the soybean variety VLSoy-2 by gamma-ray irradiation generated mutant lines lacking the A3-subunit of 11S globulin [24]. This mutagenesis also produced plants lacking α-and α'-subunits of 7S globulin [24]. Stacking of recessive mutant alleles of the genes for Kunitz trypsin inhibitor, agglutinin, and Gly m Bd 30 K was performed in the genetic background of the soybean variety Williams 82 [25]. Proteome analysis revealed that the stacking of these mutant alleles markedly decreased the accumulation of these allergens [25].
The biotechnological approach can also help to decrease the accumulation of allergens in soybean seeds. Down-regulation of the gene encoding Gly m Bd 30 K greatly suppresses the accumulation of the targeted protein in seeds of transgenic soybean [26]. The accumulation of α-, α'-, and β-subunits of 7S globulin in soybean seeds can be greatly decreased through RNA interference or artificial microRNA systems [27,28]. Recently, the transcription activator-like effector nucleases (TALE Ns) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated endonuclease 9 (Cas9) systems have become the main platforms for site-directed mutagenesis in higher plants [29][30][31][32]. They enable the fine tuning of traits in soybean breeding when applied to various soybean varieties. The CRISPR/ Cas9 system can be used to develop hypoallergenic soybean directly from elite varieties, because it has been optimized for various soybean varieties [33][34][35][36][37].
The subunits of 7S and 11S globulins are closely associated with seed characteristics important for food processing such as gel-forming and emulsifying properties [38][39][40]. To produce hypoallergenic soybeans without impairing the processing properties, we focused on two allergenic proteins, Gly m Bd 28 K and Gly m Bd 30 K, because no pyramiding of mutant alleles of these allergens in soybean has been reported. Here, we constructed a plasmid for simultaneous site-directed mutagenesis of these genes with the CRISPR/Cas9 system and used it for Agrobacterium-mediated transformation of two soybean varieties. Cas9-free plants carrying mutant alleles of the targeted loci, with the transgenes excluded by genetic segregation, were obtained in the T 2 or T 3 generations. Immunoblot analysis revealed that Gly m Bd 28 K and Gly m Bd 30 K proteins did not accumulate in seeds of the Cas9-free plants. Our results demonstrate the applicability of the CRISPR/Cas9 system for the production of hypoallergenic soybean plants.

Results
Generation of transgenic soybean plants harboring the CRISPR/Cas9 expression module.
To conduct the site-directed mutagenesis of soybean with the CRISPR/Cas9 system, we designed two guide RNAs (gRNAs) to mutagenize the Gly m Bd 28 K and Gly m Bd 30 K loci (Fig. 1a). Explants of Enrei and Kariyutaka were inoculated with Agrobacterium harboring the pMR284_28K_30K plasmid (Fig. 1b). Two Enrei T 0 plants (E1 and E2) and nine Kariyutaka T 0 plants (K1 to K10) were obtained. All the T 0 plants set T 1 seeds. In our previous study, many T 0 plants produced by our soybean transformation system failed to transmit the transgenes into the T 1 progeny [41]. This fact indicates that T 0 plants originated from chimeric tissues which contained transformed and non-transformed cells [41]. Therefore, we did not examine transgene integration or the induction of mutagenesis in the T 0 plants, and grew representative T 1 plants for further analyses. The plant numbers of the T 1 and T 2 generations were indicated by giving the T 0 individual number followed by the branch number for each generation.

Mutations in T 1 plants detected by cleaved amplified polymorphic sequence (CAPS) analysis
Representative 20 T 1 Enrei plants and 25 T 1 Kariyutaka plants derived from 12 T 0 plants were grown, and the induction of mutagenesis in the targeted loci was evaluated by CAPS analysis of the genomic DNA. The DNA fragments were classed into wild-type and mutant-type based on the expected size; a fragment of unexpected size was also detected and considered as mutant-type ( Fig. 2). Mutations were detected in both targeted loci in plants of both varieties (Fig. 2, Additional file 2: Figure  S1, Additional file 1: Tables S1, S2). Integration of the transgene was also examined by PCR analysis with Cas9specific primers. PCR analysis revealed that 11 T 1 plants (24.4% of all T 1 plants examined) were Cas9-free; among these, E1-4, E1-8, and E1-9 had mutant alleles in the Gly m Bd 30 K locus, whereas the others had the wildtype alleles of both targeted loci (Additional file 1: Tables S1, S2).

Transmission of mutations and transgenes to the T 2 generation
Because none of the Cas9-free T 1 plants had mutations in both targeted loci, 13 representative T 1 plants were advanced to the next generation. A total of 348 T 2 seeds collected from the 13 T 1 plants were evaluated for the mutations in the targeted loci (Table 1). In CAPS analysis, 227 (65%) T 2 seeds showed mutant-type fragments of both targeted loci (   (Table 1). Thus, frequency of simultaneous site-directed mutagenesis in both targeted loci was much higher in the T 2 generation than in the T 1 generation (Table 1, Additional file 1: Tables S1, S2).

Development of double-mutant T 3 seeds
We used the heterozygous and biallelic mutants to develop more homozygous mutant alleles. We collected T 3 seeds and sequenced both targeted loci. In total, 4 haplotypes in Enrei and 21 haplotypes in Kariyutaka were found in the double-mutants (Table 3). Deletions (1 to 43 nucleotides) were the most common mutations ( Table 3, Additional file 2: Figure S5). Predicted amino acid sequences of Gly m Bd 28 K ( Figure S6) and Gly m Bd 30 K ( Figure S7) are shown in Additional file 2. Three mutant alleles (d3 and d6 for the Gly m Bd 28 K locus, and d6 for the Gly m Bd 30 K locus) had in-frame mutations (Additional file 2: Figures S6, S7). In the Gly m Bd 30 K locus, the 3-nucleotide deletion generated a stop codon at the mutation site, and the 33-nucleotide deletion was not predicted as an in-frame mutation, because the deleted region contained the splicing site (Additional file 2: Figure S7).

Analysis of Gly m Bd 28 K and Gly m Bd 30 K proteins in mature double-mutant seeds
We selected two Enrei haplotypes (E-type1 and E-type3) and seven Kariyutaka haplotypes (K-type2, K-type4, K-type7, K-type9, K-type14, K-type15, and K-type19) from the double-mutants (Table 3), and examined the composition of crude protein fractions prepared from mature seeds. The Gly m Bd 30 K protein was visually detectable in Enrei and Kariyutaka but not mutant seeds, whereas Gly m Bd 28 K was not detectable in any seeds in the SDS-PAGE analysis (Fig. 4a). The double-mutant seeds had no signal bands that were not detected in wild-type seeds (Fig. 4a). To detect the Gly m Bd 28 K and Gly m Bd 30 K proteins specifically, immunoblot analysis was conducted in double-mutant and wild-type seeds. In the immunoblot analysis, the Gly m Bd 28 K and Gly m Bd 30 K proteins were detected only in seeds of wild-type Enrei or Kariyutaka, except that Gly m Bd 28 K was also detected in the E-type3 haplotype (Fig. 4b,  c). No immunoreactive band of unexpected size was detected (Additional file 2: Figure S8).

Expression levels of the Gly m Bd 28 K and the Gly m Bd 30 K genes
To evaluate the expression levels of the Gly m Bd 28 K and the Gly m Bd 30 K genes, we extracted total RNA from the mature T 3 seeds of two Enrei mutants (E-type1 and E-type3), seven Kariyutaka mutants (K-type2, K-type4, K-type7, K-type9, K-type14, K-type15, and K-type19), Enrei and Kariyutaka, and conducted semiquantitative RT-PCR analysis of the region up-stream of the mutation site (Additional file 2: Figure S9). Although The presence of mutations was evaluated by CAPS analysis a The letter and first number correspond to the number of the parental T 0 plant amplified products of the 18S ribosomal RNA (18S rRNA) were detected at similar levels in all mature seeds of the mutants, Enrei, and Kariyutaka (Fig. 5), all mutants showed lower expression levels of the Gly m Bd 28 K and Gly m Bd 30 K genes than those of wild-type (Enrei and Kariyutaka) seeds (Fig. 5).
Whole-genome sequencing in T 2 plants to validate the absence of foreign DNA  [42]. Each 20-mer identical between the plant genome and the vector was detected (Fig. 6). The genome of K2-1-16 clearly showed significant signals in a vector-wide manner (Fig. 6a, c), whereas that of K4-1-37 had no signal of foreign DNA from the vector (Fig. 6b). A significant signal found in the G-statistic of K4-1-37 was considered as a false positive, because it had a much lower value than that of K2-1-16.

Morphological characteristics of double mutants
To assess the consequences of the site-directed mutagenesis in the targeted loci, we examined the   (Fig. 3), we examined the morphological characteristics of T 2 plant body and T 3 seed size and shape. No difference was detected between the double-mutants and wild-type in the plant and seed morphological characteristics (Additional file 2: Figures S10, S11).

Discussion
Gly m Bd 28 K and Gly m Bd 30 K are the major allergenic proteins in soybean seeds [6,43]. The mutant alleles of these loci have been identified by surveying the soybean germplasm or generated by gamma-ray irradiation mutagenesis [19,44,45], and stacking of these mutant alleles will enable development of hypoallergenic soybean lines. In contrast, site-directed mutagenesis mediated by the CRISPR/ Cas9 system enables the induction of mutations directly in the targeted loci of the desirable donor plants such as varieties and elite breeding lines. This approach dramatically shortens breeding period and saves labor. In this study, we performed simultaneous site-directed mutagenesis of both Gly m Bd 28 K and Gly m Bd 30 K loci in two Japanese soybean varieties. A total of 14 T 2 -generation seeds possessed mutant alleles of both loci and had the Cas9 gene removed through genetic segregation (Table 3). Among all mutations, deletions were predominant and caused frame-shifts (Additional file 2: Figures S4-S6). The frame-shift mutations resulted in the deficiency in proteins recognized by the polyclonal antibodies against Gly m Bd 28 K and Gly m Bd 30 K proteins (Fig. 4). No bands of unexpected size were detected with either of these antibodies (Fig. 4). Frame-shift mutations in the targeted loci decreased the expression levels of the Gly m Bd 28 K and the Gly m Bd 30 K genes (Fig. 5). These findings suggest that the frame-shift mutations produce aberrant mRNAs from the targeted locus, which induced nonsense mRNA decay (NMD), like in a site-directed mutagenesis study conducted in Brassica carinata using the hairy root transformation system [46]. The lower expression level than wild-type might result in the Fig. 3 Mutational spectra of the targeted loci in double-mutant T 2 seeds. Red and blue nucleotide sequences have the same meaning as those in Fig. 1. Green nucleotide denotes an insertion. Letters and numbers in parentheses indicate the type of mutation in the targeted locus: e.g., d1, a single-nucleotide deletion; i1, a single-nucleotide insertion; wt, no mutation. Control, reference sequence (Enrei or Kariyutaka) deficiency in proteins recognized by the polyclonal antibodies against Gly m Bd 28 K and Gly m Bd 30 K proteins. On the other hand, several T 3 seeds had mutant alleles with putative in-frame mutations (Additional file 2: Figures S6, S7). The E-type3 haplotype with a 3-nucleotide deletion in the Gly m Bd 28 K locus showed a strong immunoreactive band with the antibody against the Gly m Bd 28 K protein, whereas the expression level of the Gly m Bd 28 K gene was lower than that in Enrei (Fig. 5). In this study, the expression level of the targeted loci was examined in only mature seeds of representative mutants and wild-type. Soybean seeds accumulate Gly m Bd 28 K and Gly m Bd 30 K proteins during seed filling [47]. Therefore, an investigation of the expression level of the targeted loci in immature seeds might lead to further understanding of accumulation mechanism of mutant proteins.
At least three immunodominant epitopes in Gly m Bd 28 K and five in Gly m Bd 30 K have been identified [48][49][50]. In this study, gRNAs were designed against the fourth exon of Gly m Bd 28 K and first exon of Gly m Bd 30 K (Fig. 1). Immunodetection of proteins generated by the in-frame mutations in T 3 seeds would indicate the presence proteins with preserved epitopes (Additional file 2: Figures S6, S7). Analysis of sera of soybeanallergic patients may further clarify the allergenic properties of soybean seeds generated in this study.
Multiple mutant alleles were detected in the progeny of one T 0 plant (Fig. 7). Three mutant alleles (i1, d2, and d5) in the Gly m Bd 28 K locus and five (d1, d2, d5, d6, and d1s1) mutant alleles in the Gly m Bd 30 K locus were ascertained in the Cas9-free T 2 and T 3 seeds derived from the K1 T 0 plant (Fig. 7). These mutations appeared after the T 2 generation, when the distribution of mutant alleles in the targeted loci was validated in the genealogy of the K1 plant and its progeny (Fig. 7). Twelve haplotypes (K-type1 to K-type12) were consequently obtained in the Cas9-free T 3 seeds (Fig. 7). Previously, we showed that simultaneous site-directed mutagenesis of duplicated loci using a single gRNA resulted in heterozygous and/or chimeric mutations in the targeted loci in most of the T 1 plants [36]. On the other hand, the mutant alleles of multiple targeted loci have been induced in early generations such as T 0 or T 1 plants in other studies on soybean site-directed mutagenesis by the CRISPR/Cas9 system [37,51,52]. This difference might be explained by different growth and maturity habits of the soybean varieties used. Kariyutaka has early flowering and a short period of vegetative growth [53]; the latter might decrease the chance of the occurrence of mutations in germ cells in the T 0 generation, however, might produce multiple mutant alleles after the T 1 generation. Therefore, the site-directed mutagenesis using Kariyutaka might be useful system for obtaining multiple mutant alleles in targeted genes efficiently in a limited number of transgenic soybean plants.

Conclusion
We used Agrobacterium-mediated transformation and two gRNAs for simultaneous site-directed mutagenesis of two allergenic genes, Gly m Bd 28 K and Gly m Bd 30 K, in two Japanese soybean varieties. Cas9-free plants that had mutant alleles of the targeted loci and transgenes excluded by genetic segregation were obtained in the T 2 or T 3 generation. Immunoblot analysis revealed that the double-mutants did not accumulate Gly m Bd 28 K or Gly m Bd 30 K protein in mature seeds. Our results showed that simultaneous site-directed mutagenesis by the CRISPR/Cas9 system removed two major allergenic proteins in mature soybean seeds.

Soybean transformation
The Japanese soybean varieties Enrei (JP 28862) and Kariyutaka (JP 86520) were obtained from Genebank, National Agriculture and Food Research Organization (https://www.gene.affrc.go.jp/index_en.php). Agrobacterium-mediated transformation was performed as described in [28], except that the concentration of glufosinate for selection of transformed cells was decreased from 6 mg/L to 4 mg/L for Enrei. Agrobacterium tumefaciens EHA105 harboring the plasmid pMR284_28K_30K was used. Transgenic plants were grown in commercial soil (Katakura Chikkarin Co.,   separated by electrophoresis in 2.0% agarose gels. The DNA fragments of expected digested-pattern derived from the targeted region carrying mutations and those with no mutations were considered as the mutant type and wild type, respectively. DNA fragments of unexpected size were also regarded as mutant type.

DNA sequencing
The targeted and flanking regions of the Gly m Bd 28 K and Gly m Bd 30 K loci were amplified with specific primers (Additional file 1: Table S3). The amplified products were cloned into the pGEM-T-Easy vector (Promega, Madison, USA) and sequenced with the Big Dye terminator cycle method using an ABI3100 or ABI3130 Genetic Analyzer (Thermo Fisher Scientific). DNA sequencing analysis was performed by the Instrumental Analysis Division, Graduate School of Agriculture, Hokkaido University.

Selection of Cas9-free plants
To confirm the integration of the Cas9 and gRNA expression module in T 1 -T 3 generations, PCR analysis was performed using primers specific for the Cas9 gene (Additional file 1: Table S3). PCR was also performed to simultaneously amplify endogenous Glyma.01G214600 as a positive control. The PCR was performed under the following conditions: 30 cycles of 94°C for 30 s, 54°C for 30 s and 72°C for 30 s. The existence of the Cas9 gene were identified by the existence of products amplified by the PCR.

Protein analyses in mature seeds
Soy meal was collected from mature seeds. The extraction of crude protein and protein separation were performed as described in [28]. Proteins were separated by SDS-PAGE in a precast 5-12% gradient gel (ATTO, Tokyo, Japan) and transferred onto a PVDF membrane (Hybond-P; GE Healthcare, Little Chalfont, UK). Membranes were blocked with 5% skim milk (Wako) overnight at 4°C. Recombinant Gly m Bd 30 k was prepared using the baculovirus expression system as described in [5]. Using the pET52 vector (Merck-Millipore, Burlington, USA), His 10 -tagged Gly m Bd 28 K was expressed in Escherichia coli BL21(DE3). After sonication and centrifugation, Gly m Bd 28 K-containing pellets were dissolved in phosphate-buffered saline containing 8 M urea, and Gly m Bd 28 K was purified using a HisTrapFF crude column (GE Healthcare). Antisera were raised in rabbits against the recombinant proteins as described in [56]. Immunoreactive bands were detected with the antisera and the ECL Plus Western Blotting system (GE Healthcare).  Table 3. In parentheses, the left description of "/" refers to genotypes of Gly m Bd 28 K locus and the right one to genotypes to Gly m Bd 30 K locus in all transgenic generations

Expression analysis by semi-quantitative RT-PCR
Total RNA was extracted from mature seeds of mutants, Enrei, and Kariyutaka by the LiCl precipitation procedure [28]. Semi-quantitative RT-PCR was conducted in a 20-μL volume using 30 or 38 cycles of 94°C for 30 s, 57°C for 30 s, and 72°C for 10 s. The transcript level of the Gly m Bd 28 K and the Gly m Bd 30 K gene was evaluated relative to that of the 18S rRNA gene (XR_003264275).

Genome sequencing
Total DNA was isolated from fresh leaves (1.0-2.0 g) of wild-type and T 2 plants as described in [41]. Genomic DNA libraries were constructed using a TruSeq DNA PCR-Free Library Preparation Kit (Illumina, San Diego, USA).Whole-genome sequencing was conducted on an Illumina HiSeq X platform to obtain 151-nt paired-end reads. Approximately 50× coverage data were obtained for each sample. Unintended remaining foreign DNA was detected as described in [42].
Additional file 1: Table S1. Induction of mutations in the targeted loci and the integration of the Cas9 gene in representative T 1 plants from the transformation of Enrei. Table S2. Induction of mutations in the targeted loci and the integration of the Cas9 gene in representative T 1 plants from the transformation of Kariyutaka. Table S3. Primer sequences used for vector construction, confirmation of transgenes, and CAPS, semiquantitative RT-PCR, and sequencing analyses.
Additional file 2: Figure S1. Confirmation of mutagenesis of targeted loci in representative Kariuytaka-T 1 plants by CAPS analysis. Figure S2. Detection of mutations in the Gly m Bd 28K and Gly m Bd 30K loci in representative Enrei-T 2 seeds by CAPS analysis. Figure S3. Detection of mutations in the Gly m Bd 28K and Gly m Bd 30K loci in representative Kariyutka-T 2 seeds by CAPS analysis. Figure S4. Detection of the integration of the Cas9 gene in representative T 2 seeds by PCR analysis. Figure  S5. Mutational spectra of the targeted loci in double-mutant T 3 seeds. Figure S6. Alignment of predicted amino acid sequences of the Gly m Bd 28K locus in double mutants. Figure S7. Alignment of predicted amino acid sequences of the Gly m Bd 30K locus in double mutants. Figure S8. Full-length gel electrophoresis and immunoblot of the crude protein of representative double-mutant T 3 and wild-type mature seeds. Figure S9. Primer sites used for semi-quantitative RT-PCR analysis of the Gly m Bd 30K and the Gly m Bd 30K loci. Figure S10. Morphological characteristics of representative double-mutant (T 2 ) and control Kariyutaka plants. Figure S11. Morphological characteristics of representative double-mutant (T 3 ) and control Kariyutaka seeds.