Identification and characterisation of seed storage protein transcripts from Lupinus angustifolius

Background In legumes, seed storage proteins are important for the developing seedling and are an important source of protein for humans and animals. Lupinus angustifolius (L.), also known as narrow-leaf lupin (NLL) is a grain legume crop that is gaining recognition as a potential human health food as the grain is high in protein and dietary fibre, gluten-free and low in fat and starch. Results Genes encoding the seed storage proteins of NLL were characterised by sequencing cDNA clones derived from developing seeds. Four families of seed storage proteins were identified and comprised three unique α, seven β, two γ and four δ conglutins. This study added eleven new expressed storage protein genes for the species. A comparison of the deduced amino acid sequences of NLL conglutins with those available for the storage proteins of Lupinus albus (L.), Pisum sativum (L.), Medicago truncatula (L.), Arachis hypogaea (L.) and Glycine max (L.) permitted the analysis of a phylogenetic relationships between proteins and demonstrated, in general, that the strongest conservation occurred within species. In the case of 7S globulin (β conglutins) and 2S sulphur-rich albumin (δ conglutins), the analysis suggests that gene duplication occurred after legume speciation. This contrasted with 11S globulin (α conglutin) and basic 7S (γ conglutin) sequences where some of these sequences appear to have diverged prior to speciation. The most abundant NLL conglutin family was β (56%), followed by α (24%), δ (15%) and γ (6%) and the transcript levels of these genes increased 103 to 106 fold during seed development. We used the 16 NLL conglutin sequences identified here to determine that for individuals specifically allergic to lupin, all seven members of the β conglutin family were potential allergens. Conclusion This study has characterised 16 seed storage protein genes in NLL including 11 newly-identified members. It has helped lay the foundation for efforts to use molecular breeding approaches to improve lupins, for example by reducing allergens or increasing the expression of specific seed storage protein(s) with desirable nutritional properties.


Background
The genus Lupinus from the legume family (Fabaceae) comprises between 200 and 600 species, of which only a few have been domesticated. Lupinus angustifolius (L.), also known as narrow-leaf lupin (NLL) is a grain legume crop that is gaining recognition as a potential human health food as the grain is high in protein and dietary fibre, gluten-free and low in fat and starch and thus has a very low Glycaemia Index [1]. Like other legumes, lupin crops are an asset for sustainable cropping in rotations with cereal and oil seed crops. They act as a disease break, allow more options for control of grass weeds and as nitrogen-fixing legumes, reduce the need for fertilizers, enrich the soil for subsequent crops [2].
Recently considerable interest has been directed towards legume seed proteins, with studies demonstrating nutritional, nutraceutical and health benefits [3,4]. With increased awareness in many societies of the escalating incidence of obesity and the associated risk of diabetes and cardiovascular disease, NLL is an excellent candidate as a healthy food.
The major proteins in legume seeds are storage proteins defined as any seed protein that accumulates in significant quantities, has no known function during seed development, and is rapidly hydrolysed upon germination to produce a source of N and C for the early stages of seedling growth [3,5]. Seed storage proteins have beeen classified into four families, termed 11S globulin (also known as α conglutin, legumin, legumin-like and glycinin), 7S globulin (also known as β conglutin, vicilin, convicilin and vicilin-type), 7S basic globulinalso known as γ conglutin) and 2S sulphur-rich albumin also known as δ conglutin). For simplicity, in this study we will refer to the lupin seed storage proteins as α, β, γ and δ conglutins.
Specific nutritional and pharmaceutical attributes have being assigned to lupin conglutins [3]. White lupin (L. albus) γ conglutin has structural similarity with xyloglucan-specific endo-beta-1,4-glucanase inhibitor proteins (XEGIPs) and Triticum aestivum xylanase inhibitor (TAXI-1) [6], and is able to bind to the hormone insulin and to the insulin-like growth factor, IGF-1 and IGF-II [7,8], and may be able to play a pharmaceutical role similar to the hypoglycaemic drug metformin [8]. NLL grain has satiety properties, because food enriched with lupin seed protein and fibre significantly influences subsequent energy intake [9]. Furthermore, bread enriched with NLL protein and fibre may help reduce blood pressure and the risk of cardiovascular disease [10,11].
As seen with the majority of edible legume grains, seed proteins from lupin species can cause allergy in a small percentage of the population [12]; 'lupin allergy' occurs either separately or together with peanut allergy or allergy to other legumes [12,13]. Peanut-lupin cross allergy has been reported in which IgE antibodies that recognise peanut allergens also cross react with NLL conglutins [14,15]. One study has proposed that all lupin conglutin families are candidate allergens [16]. However, other studies have found that α and γ conglutins are the main allergens from white lupin [17] whilst patients who were allergic specifically to NLL and not peanut had serum IgE that bound β conglutins [18].
Here we analysed NLL seed ESTs at the molecular level through the construction and sequencing of a cDNA library made from seed mRNA isolated at the major filling stage. We identified ESTs from genes belonging to each of the four conglutin families. In total 16 members were identified, eleven of which had not been described previously. These NLL conglutins are in addition to conglutins identified from the only other characterized lupin, L. albus for which nine conglutin sequences have been deposited in GenBank [19]. The NLL conglutin sequences were compared to each other and to other legume seed storage proteins providing an insight into the evolution of these proteins in grain legumes. We also examined the specific gene expression profiles of the NLL conglutin genes and demonstrate that the expression of each is increased significantly during seed filling. This comprehensive identification of the NLL conglutins opens up the gateway to better characterise lupin molecular biology, physiology, biochemistry and nutrition.

Isolation of new NLL conglutin genes
A cDNA library was constructed from NLL seed at 20-26 DAA (days after anthesis), which coincided with the major seed-filling stage. Three unique α, seven β, two γ and four δ conglutin sequences were identified after sequencing 3017 ESTs. ABR21772.1] that has identity to BETA1 between amino acids 1-445, but is truncated as it contains a premature stop codon. In this study we identified a further 11 new conglutin sequences consisting of two α, five β, one γ and three δ conglutin sequences. Within each family the sequences were aligned using the CLC Genomics Workbench 3 software [20] as shown in Figure 1   Comparison of NLL conglutins to other legume sequences Seed storage protein homologues from Glycine max (soybean), Pisum sativum (pea), Arachis hypogaea (peanut), Medicago truncatula and Lupinus albus (white lupin) that had BLAST sequence alignment scores greater than 200 when compared to any of the 16 NLL conglutin protein sequences were identified from the NCBI non-redundant protein database. These sequences were compared to each other within each family using a distance based method from the CLC Genomics Workbench 3 software [20]. Members of each family were identified from all plant species examined with the following exceptions: there were no M. truncatula 11S globulin or 2S sulphur-rich sequences, no peanut 7S basic globulin sequences and no pea 7S basic globulin or 2S sulphur-rich sequences. The number of protein sequences identified for each species may not accurately represent the final number of members in each group. In some cases, they may under-represented as some genes may lack homology to the NLL conglutins used in this analysis, or are yet to be identified.
Alternatively, they may be over-represented as two or more proteins may be derived from the same gene via processing [21]. Figure 3 presents the phylogenetic relationship between seed storage protein families from the six legume species studied. For simplicity all the sequences were renamed with the species initials, followed by a number. The corresponding accession numbers are listed in Table 1.
While most 11S globulin protein sequences showed the highest homology with other members in the same species, there were exceptions; for example, the NLL ALPHA1 is more homologous to sequences from white lupin (La1) and peanut than to NLL ALPHA2 or ALPHA3 ( Figure 3A). In general, 7S globulin sequences showed greatest identity within species ( Figure 3B). For example all NLL β conglutin-like sequences were more homologous to each other than to 7S globulin sequences from other legume species. This was also the case for soybean, M. truncatula and peanut. The pea seed storage protein phylogenetic relationship was more complicated, with three of the four groups being more diverged from each other than from seed storage proteins from other legumes. In the case of basic 7S sequences, the soybean basic 7S sequences were species specific with the exception of Gm5, which shared similar sequence identity with all basic 7S sequences. However this was not seen with white lupin and NLL where GAMMA1 [Genbank:HQ670416] and GAMMA2 [Genbank:HQ670417] were more similar to La1 and La2 [22] than each other ( Figure 3C). Furthermore, the basic 7S Mt1 sequence was more homologous to white lupin, NLL and soybean sequences than other basic 7S sequences from M. truncatula. The 2S sulphur-rich   Table 1.  Table 1 Arachis hypogaea (Ah), Glycine max (Gm), Medicago truncatula (Mt), Lupinus albus (La), and Pisum sativum (Ps) identification and accession numbers used in Figure 3 ALPHA homologues BETA homologues GAMMA homologues DELTA homologues albumin sequences shared the highest sequence homology within each legume species ( Figure 3D), although NLL DELTA4 is quite distinct from the other NLL δ conglutin sequences.

Changes in expression of NLL conglutins during seed development
Sequencing of ESTs from NLL seed (20-26 DAA) identified 42% as conglutins. The EST sequencing also identified expression of other major groups of genes including those encoding ribosomal proteins, protein translation factors, oleosins and seed maturation proteins. The 16 unique NLL conglutin genes were used as reference sequences against all 3017 ESTs using the CLC Genomics Workbench 3 software [20]. Based on transcript levels, the most abundant conglutin family was β (56%), followed by α (24%), δ (15%) and γ (6%). The proportion (and total number) of ESTs corresponding to a particular conglutin gene within each conglutin family is presented in Figure 4, and this provides an estimate of the relative expression levels of each conglutin at 20-26 DAA. Figure 5 presents the relative expression of each conglutin gene over the time course of NLL seed development using specific primers for each conglutin gene.

Proteomic identification of NLL conglutins and IgE binding conglutins
With the availability of full-length sequences for NLL seed storage proteins derived from this work, it was possible to analyse the mass spectrometry results from the analysis of 2D blots [18] that had been probed with serum from individuals specifically allergic to lupin. Here the IgE-binding spots identified originally as β conglutin were analysed and many could be further classified into isoforms (BETA1-7). The identies of each spot are shown in Table 2 and Figure 6. BETA4 was the top match for the majority (11) of these spots. Only one spot could be unequivocally matched to BETA1, two to BETA2, one to BETA3, three to BETA5 and one to BETA7, although this does not rule out that other undetected beta isoforms may be present in these spots. In addition there maybe peptide contamination from spots that are not convincingly separated. No spots could be matched exclusively to BETA6 as for three of the spots it was not possible to distinguish between BETA6 and BETA4. There was evidence that a number of spots either contained protein from more than one β conglutin isoform or that there are other β conglutin forms that have not been identified in this study. As the seven β conglutin isoforms are conserved over the whole protein ( Figure 1B), no potential epitope(s) was able to be deduced.
Three spots corresponding to GAMMA1 (spots 1, 6 and 59) were identified with the peptide coverage matching the sequence of the mature protein rather than that of the unprocessed precursor [23]. The newlysynthesised protein is first cleaved to remove a hydrophobic signal peptide and then a second time to produce large and small subunits [23,24]. Peptides identified for spot 59 matched the large subunit and spots 1 and 6 matched the small subunit which covers the C-terminus of the deduced protein (Table 2 and Additional file 1). There was no evidence of spots corresponding to the GAMMA2 protein. GAMMA1 (spot 6) showed IgE binding; however, with the higher resolution available for this analysis, it was clear that the spot was contaminated with BETA4 protein and this may explain why it appeared to bind IgE.
Mass spectrometric analysis of a number of major representative spots that did not bind IgE (spots 87-89, 99 and 100), were identified as α conglutin with ALPHA1, ALPHA2 and ALPHA3 being present ( Figure 6 and Table 2   Discussion This study identified 16 conglutin genes belonging to four families in NLL of which only five had been identified previously. It also significantly extended our knowledge base of seed storage proteins in lupin in general, and allowed useful comparisons with the other characterised species including L. albus, for which nine members have been identified in Genbank. Sequence alignment of the NLL conglutins to homologous sequences from M. truncatula, soybean, pea, peanut and white lupin illustrated that, in general, the strongest conservation occurred within species. In the case of β and δ conglutins, our analysis suggests that gene duplication occurred after legume speciation. This was in contrast to α and γ homologous sequences where some of these sequences were likely to have diverged prior to speciation. The largest family in NLL was the β conglutins with seven members, while the α, γ and δ conglutin families ranged in size from two to four members. It remains to be determined if there are functional differences within each of the families. In the case of α and β conglutins, the differences between family members often involved insertions/deletions of repeated amino acid stretches of predominantly glutamic acid (E), glutamine (Q), serine (S), glycine (G) and arginine (R). These amino acids have a low hydropathy index, suggesting that the peptide regions involved are likely to be found towards the surface of the protein.
There have been a number of studies of developmental processes in legume seeds [25]. During the cell enlargement (seed-filling) phase of seed development, N accumulation and protein synthesis rely on both symbiotic N 2 fixation and uptake of N from the soil [26]. Proteins involved in cell division are abundant during early stages of seed development, and their level decreases before the accumulation of the major storage proteins during seed-filling [27]. Our expression data provides evidence that in NLL, the conglutins began to be expressed at relatively high levels from 9-12 DAA, and peaked between 33-38 DAA, which corresponds to the seed filling stage [28]. While the general induction pattern appears similar, there are small differences between individual genes; for example ALPHA1 appears to both increase and then decrease earlier than the other two α genes. Whether these small variations are important in seed development remains to be determined, although it is interesting to note that phylogenetically, ALPHA2 and ALPHA3 are more closely related to each other than to ALPHA1. The similar induction pattern of all tested conglutin genes suggests that their expression may be regulated by a common mechanism and there may be a master regulator(s) to ensure overall protein quantity within the seed is maintained. Consistent with this hypothesis are the results from gene silencing of soybean β-conglycinin protein (7S globulin) which caused an increase of glycinin (11S globulin) [29]. In addition, there is likely to be fine tuning with posttranscriptional regulation of storage protein synthesis in response to N and S supply [30], and other environmental variations [31]. The deduced precursor proteins of NLL conglutins each have different molecular masses and isoelectric points (pIs) but this cannot be used to predict the mobility of the processed mature proteins on a 2D [32]. This has also been recorded for L. albus conglutins [33,34] where 124 polypeptide spots fell into the α, β and γ conglutin familes [35]. Analysis of the peptide coverage on the BETA spots may give some indication about the processing of β conglutin precursors to produce the mature protein. Three of the largest β conglutin spots of 48.8 kDa (spots 3,8,18) do not have any peptides identified that cover the N-terminal 108 amino acids suggesting that this region is cleaved in a similar manner to that for peanut Ara h1 [36]. Similarly, many of the smaller spots contain peptides only in the region from amino acid 410 to the C-terminus suggesting a second site of cleavage. These observations would need to be confirmed by N-terminal sequencing of the different spots. It is possible that differential glycosylation or some other modification may also contribute to the large number of β conglutin spots, given that many spots corresponded to the same region of a particular β conglutin form with only slight differences in size and pI (e.g. spots 52, 55 and 59).
Peanut is regarded as the most severe allergenic hazard among legume seed proteins and is the best studied with respect to allergenicity. Each of the three main peanut allergens has a homolog to lupin conglutins. Thus α conglutin corresponds to Ara h3 [37,38], β conglutin to Ara h1 [39], and δ conglutin to Ara h2 [40]. In addition, each protein has the potential to have multiple allergenic sites, for example Ara h3 contains eight distinct epitopes and most of these differ from the corresponding regions of other legume and tree-nut allergens [41]. Identification of specific allergens and their IgE binding epitopes is an important step if lowallergen traits are developed. For example, markers have been utilized to identify germplasm with reduced expression of the allergenic soybean seed P34 protein [42]. The lack of conservation of allergenic epitopes between species, and the fact that many different proteins can be allergenic makes identifying allergens across species by comparative studies difficult, and therefore the IgE-binding of each potential allergenic protein must be assessed.
Individuals allergic to peanut and lupin may react to different proteins to those that react only to lupin. A previous study based on the limited lupin conglutin sequences available at that time found that for individuals allergic to lupin but not other legumes, β conglutin was likely to be the major allergen [18]. Our results, which are based on the analysis of 16 NLL conglutin proteins, confirm and extend this earlier study and show that all β conglutin members are potential allergens, while members from other conglutin families are unlikely to be contributing to lupin specific allergenicity. At this stage it is not clear if there is a simple common epitope on the β congutins responsible for this form of allergenicity or if there are several epitopes, but it seems likely that the different forms of β conglutin share some common epitopes. When the epitope(s) that cause allergic reactions to β conglutin has been identified, breeding Peptide matches for each spot are listed in Additional file 1. *although the score for this isoform is lower than the other matches there are peptides that specifically match these forms.
of varieties with reduced allergens may be possible. For example, domesticated and wild lupin could be screened for lines having reduced expression of specific allergenic conglutins or biotechnology strategies could be employed to reduce the levels of allergens in developing seeds. Already techniques using RNA interference (RNAi) to target allergen genes in peanut and tomato are showing encouraging results [43,44] and this approach can now be extended to NLL, for which transformation systems are in place [45].

Conclusions
This study has found that L. angustifolius has at least 16 seed storage protein genes that fall into four families. Analysis of the expression of each gene during seed development showed that all 16 genes share similar expression patterns and are most highly expressed 33-38 days after anthesis which corresponds to the period of maximum seed filling [28]. Comparative studies to other legumes has provided insight into the evolution of these genes with evidence of gene duplication occurring after speciation in some cases. Lupin seeds, like those from other grain legumes contain allergenic proteins and our studies have identified that all seven members of the β conglutin family are potential allergens for people specifically allergic to lupins. These results provide opportunities to further characterize lupins at many levels including at the molecular biology, physiological, biochemical and nutritional levels. (iii) Figure 6 Identification of L. angustifolius seed storage proteins. Lupin flour proteins were separated by 2D-PAGE and (A) stained with Coomassie-blue stained or (B) blotted onto a membrane, which was probed with serum from lupin-allergic individuals, to identify potentially allergenic IgE-binding proteins. Protein spots that were either IgE-binding (spots 3 -59, 94) or non-IgE-binding (spots 87 -89, 97-114) were analysed by mass spectrometry, and those for which identifications were made are enclosed by ovals, with different colours corresponding to different proteins as shown in the figure. Sections of the gel and blot (boxes i, ii and iii) have been enlarged to show more detail, with Coomassie-blue stained gels on the top and IgE-binding proteins on the bottom panel for each section. In the enlarged boxes i, ii and iii spots that bind IgE are shown in black and those that do not in red. IgE binding was determined by aligning the original film and the Coomassie-blue stained gel but for 4 spots (37, 38, 51, 57) the resolution of the gel does not give a clear image of this binding. 'Beta?' indicates spots for which the form of conglutin β could not be determined.