- Research article
Genomics and evolutionary aspect of calcium signaling event in calmodulin and calmodulin-like proteins in plants
BMC Plant Biologyvolume 17, Article number: 38 (2017)
Ca2+ ion is a versatile second messenger that operate in a wide ranges of cellular processes that impact nearly every aspect of life. Ca2+ regulates gene expression and biotic and abiotic stress responses in organisms ranging from unicellular algae to multi-cellular higher plants through the cascades of calcium signaling processes.
In this study, we deciphered the genomics and evolutionary aspects of calcium signaling event of calmodulin (CaM) and calmodulin like- (CML) proteins. We studied the CaM and CML gene family of 41 different species across the plant lineages. Genomic analysis showed that plant encodes more calmodulin like-protein than calmodulins. Further analyses showed, the majority of CMLs were intronless, while CaMs were intron rich. Multiple sequence alignment showed, the EF-hand domain of CaM contains four conserved D-x-D motifs, one in each EF-hand while CMLs contain only one D-x-D-x-D motif in the fourth EF-hand. Phylogenetic analysis revealed that, the CMLs were evolved earlier than CaM and later diversified. Gene expression analysis demonstrated that different CaM and CMLs genes were express differentially in different tissues in a spatio-temporal manner.
In this study we provided in detailed genome-wide identifications and characterization of CaM and CML protein family, phylogenetic relationships, and domain structure. Expression study of CaM and CML genes were conducted in Glycine max and Phaseolus vulgaris. Our study provides a strong foundation for future functional research in CaM and CML gene family in plant kingdom.
In the nuclear fusion of stars and sun, the elements were evolved from hydrogen . During the process of evolution, the element calcium (Ca) was born by successive capture of α particle by oxygen and neon in the process of nuclear fusion [1, 2]. After about 10 billion years, the cell membrane most likely shown its charged activity locally with relentless entropy . To adapt to changing environment, cell must respond to changing environmental signals, and cellular signaling requires an efficient messenger that can move through all parts of the cell to decipher the message. Calcium ion commonly fulfills this signaling role. The concentrations of signaling molecules vary in the cell with time and environmental conditions. The speed and effectiveness of the Ca2+ ion is 20,000 fold higher in the intracellular (~100 nM) compartment than the extracellular (~2 mM) compartment . Cells use a great deal of energy to induce changes in Ca2+ concentration and stabilize the cell. The concentration of Mg2+, which is popularly known as a cousin of Ca2+ doesn’t differ greatly across the cellular compartments. Then question arises, why the concentration of Ca2+ is very less in the cytosol? This is because Mg2+ binds the cytosolic water molecules less efficiently than phosphates. Therefore, if there will be higher Ca2+ concentrations in the cytosol, Ca2+ will bind with phosphate and thus turning the cell into a bone like structure. Unlike other complex molecules, Ca2+ cannot be altered chemically. Therefore, it is necessary to control the cytosolic Ca2+ concentration to avoid any precipitation with the phosphate in the cytosol. Hence, cells have developed necessary cellular mechanisms to control the cytosolic Ca2+ concentration by chelating, compartmentalizing or extruding the ion from the cell. Hence hundreds of proteins have evolved to bind the Ca2+ ion over a million-fold range of affinities (nM to mM) to buffer or lower Ca2+ level in the cell. One of the most important protein chelators of Ca2+ ion is the EF-hand domain containing proteins. There are hundreds of EF-hand containing proteins present in the plants. These proteins are found as family proteins. Some of the important EF-hand domain containing families of proteins are calcium dependent protein kinase (CDPK) [3, 4], calcium dependent protein kinase related kinase (CRK) , calcineurin-B like (CBL) , calmodulin (CaM) and calmodulin like (CML) protein . The CDPK contains the kinase domain, auto-inhibitory domain and a regulatory domain that contains four calcium binding EF-hands while CRK contains kinase domain, auto-inhibitory domain and a regulatory domain that contain only three calcium binding EF-hands. Additionally, the CBL contains only three EF-hands and no kinase domain while CaM and CML contain only four EF-hands and lack a kinase domain [3, 6, 7]. The calcium ion binds to the Asp (D) or Glu (E) amino acids of the EF-hands. The D and E amino acids in the EF-hands are reported to be conserved and present as D-x-D or D/E-E-L motif [8, 9]. The D-x-D motifs are conserved at 14, 15 and 16th position of the EF-hands [8, 9]. Detailed investigations of different genomics and evolutionary aspects of the CDPK and CBL protein family have been discussed recently [8, 9]. However, there have been only little information is available regarding the detail study of CaM and CML gene family in the plants. Therefore in this study, we conducted genome-wide identification of CaM and CML gene family members in plants and analyzed their genomic and evolutionary aspects. Along with the reports of CDPK and CBL protein family, this study completely unveils the genomic aspects of calcium signaling events in plants and calcium signature motifs in EF-hand domains.
Results and discussion
Genomics of CaMs and CMLs
Genome-wide identification of calmodulin (CaM) and calmodulin-like (CML) gene family members from plant shows, plant encodes more CMLs than CaMs (Table 1). The genome size of the green algae Ostreococcus lucimarinus was found to be 13.2 Mb and it encoded only two CaMs. The Coccomyxa subellipsoidea and Chlamydomonas reinhardtii encoded three and six CaMs respectively. The genome size of Brassica rapa and Mimulus guttatus was 283.8 and 321.7 Mb, respectively and both of them were found to encode 13 CaMs in their genome. The genomes of M. guttatus and B. rapa are diploid and both were found to encode 13 CaMs each. The genome size of E. grandis was found to be 691 Mb and contains only one CaM gene.
The average number of CaMs in plant was found to be 6.60 per genome and the majority of the plants encode less than 10 CaMs in its genome. The size of plant genome vary from species to species, and these variations are completely depends on the ploidy and duplication events of the genome. However, the variations in the number of genes in a gene family were not directly correlated with the genome size, ploidy or genome duplication events of an organism. The correlation regression analysis of CaMs and CMLs with respect to genome size has shown that they are not correlated (Fig. 1). The correlation coefficient of CaM was r = 0.2267 and that of CML was r = 0.1569. The tetraploid species Glycine max and Panicum virgatum encoded eight and nine CaMs respectively which is less than the CaMs of the diploid species B. rapa and M. guttatus (Table 1, Additional file 1: Table S1). The normal distribution analysis shows, the probability of genome that can encode CaMs more than once was 0.9767 (97.67%) (Table 2). Similarly, the probability to encode more than 13 CaM in a genome was only 0.0113 (1.13%). The details regarding the probability of distribution of CaM among different groups of organisms in plant lineage are mentioned in Table 2. These findings show that the presence and distribution of varied gene number and type of gene in a genome is dependent on the evolutionary pressure, it functional requirements and complexities of the plant. Two sample t-tests between CML and CaM were conducted and the mean of CML and CaM was found to be 20.26 and 6.60 respectively (Table 3). The t-value of unpaired and paired t-test was found to be 8.91 and 10.43, respectively.
Genome-wide analysis of the CML gene family in plants showed that the green algae C. subellipsoidea, O. lucimarinus, and C. reinhardtii encoded lower numbers of CMLs than the higher plants (Table 1). The genome of C. subellipsoidea and O. lucimarinus encoded only three and two CaMs, respectively. These two species encoded the same number of CMLs, whereas C. reinhardtii encoded six CaMs and three CMLs respectively. C. reinhardtii encoded more CaMs (six) than CMLs (three). Conversely, O. lucimarinus encoded equal numbers of CaMs and CMLs (two). When compared with O. lucimarinus, V. carteri was also found to contain similar numbers of CaMs and CMLs (four) (Table 1). A. thaliana encoded maximum of 47 CML genes while B. rapa encoded 36 CMLs. The tetraploid species G. max and P. virgatum encoded 26 and 20 CMLs respectively. The monocot plant O. sativa encoded 33, while S. bicolor and P. hallii encoded 27 CMLs. On the other hand, P. patens, and S. italica encoded 17 CMLs each. C. clementina, F. vesca, and M. guttatus encoded 19; A. coerulea, C. sativus, L. usitatissimum, P. persica, and Z. mays encoded 21 CMLs each. G. max, S. lycopersicum, S. tuberosum, and T. halophila encoded 27 CMLs each. This distribution of the CML gene family shows, several plant species has encoded the same numbers of CML genes while other do not. The percentage analysis comparison between CaM and CML shows, T. cacao encoded 700% and O. sativa 660% more CMLs compared to their counterpart CaMs (Table 1). The normal distribution study shows, the probability of occurrence of more than two CMLs in a plant genome was 0.9706 (97.06%) while the probability of occurrence of more than 47 CMLs was 0.0024 (0.24%) only (Table 2). The details about the probability of distribution of the CMLs among different groups are mentioned in Table 2. The student’s t-test was conducted to understand the significance of differences between gene numbers present between CaM and CML gene family. Both unpaired and paired t-test analysis shows CaM and CML gene family group members were significantly different from each other (Table 3). These changes in gene family size and unequal distribution of CaMs and CMLs may be attributed to their ploidy level and different cellular processes require for different plants , but they were not related to the size of the genome (Fig. 1). Because in principle, addition or evolution of more genes or genomic content within the genome will lead to increase in the genome size, but vice versa (increase in genome size will lead to more number of genes in a genome) is not true. This might have occurred because of the different cellular and ecological strategies associated with adaptation and expansion of the gene family [10–12]. The variations in the gene family size were largely attributed to the important mechanisms that shape natural variation and adaptation in different species .
CMLs and CaMs Contain varied numbers of introns
Genome-wide analysis of the CML gene family in plants revealed that larger parts of the CMLs were intronless. Among the studied 831 CMLs of 41 species, 596 genes (71.72%) were identified to be intronless (Additional file 2: Table S2) whereas 79 had one intron (9.5%), 24 had two introns (2.88%), 44 had three introns (5.29%), 29 had four introns (3.48%), and 15 had five introns (1.8%). Only a few CMLs contained six, seven, eight or nine introns, and none of them were found to contain ten or more than ten introns (Additional file 2: Table S2). In opposite to CMLs, the majorities of CaMs were contained introns. Among the studied 271 CaMs of 41 species, 14 (5.16%) were found to be intronless, 113 (41.69%) contained one, 35 (12.91%) contained two, 86 (31.73%) contained three, six (2.21%) contained 4, five (1.84%) contained five, and seven (2.58%) contained six introns respectively. The evolutionary perspectives regarding the presence of introns in eukaryotic protein coding genes are not yet clear. However, Mattic  reported that introns can function as a transposable element and nuclear introns has originated from the self splicing group II introns, which later evolved in conjunction with the spliceosome. It assumed that these introns were evolve after divergence from the prokaryotes and later established in the eukaryotic genome with new genetic space and function, which provided a positive pressure for their expansion . According to this concept, it can be speculated that the majority of CMLs were intronless and can therefore be considered older than CaMs. A few CMLs contains introns in their genes, and it is believed that these introns were evolved recently with CaMs. This explains why the intron containing CMLs contains only one (9.5%) intron in their gene. Similarly, a few CaMs were also intronless (5.5%), which indicates that the genome has yet to incorporate the introns into the CaMs. Some other CaMs contains either one (42.44%), two (34.81%) or three (32.22%) introns. This could be possible because these introns were might have added recently and the genome did not got ample time to add more introns into the CaMs. Similarly, the introns present in CMLs are assumed to have been added recently. It requires sufficient time to carry out a major evolutionary event and the addition of more introns into a gene.
According to the intron late hypothesis, introns are the eukaryotic novelty and new introns are emerging continuously during the evolution of eukaryotic genome . Different genes in eukaryotic organisms differ dramatically in terms of density and size distribution. In some cases, zero to six introns per kilobase were observed in the eukaryotic genome [15, 16]. Comparative analysis of exon-intron structures of orthologous genes in higher eukaryotic organisms revealed that they share approximately 25% to 30% of the introns . The presence of 71.72% intronless genes in CMLs shows that the CMLs of plants are highly orthologous and conserved genes in the plant kingdom that evolved from a common ancestor. Similarly, the presence of 42.44%, 34.81% and 32.22% similarity for one, two and three introns containing genes, respectively, shows their close homology with orthologous genes. Intron loss events dominate the short evolutionary distances, whereas intron gain accompanies important evolutionary transitions. Intron gain is an ongoing process, and a high rate of intron gain has been reported for paralogous genes in the model plant Arabidopsis thaliana and Oryza sativa [17–19]. The shared introns were likely derived from a common ancestor of the corresponding species, while the lineage-specific introns were introduced into the genes at the subsequent stages of evolution.
CaM contains four D-x-D motifs and CML contains One D-x-D-x-D motif in their EF-hands
CaMs and CMLs are evolutionarily conserved gene families of plants, therefore it was very important to understand their conserved domains and motifs. Hence, we conducted multiple sequence alignment of CaMs and CMLs protein sequences separately to identify the conserved domains and motifs. Multiple sequence alignment has revealed the presence of several conserved domains and motifs. The CaM protein contains four calcium binding EF-hands (Fig. 2). Multiple sequence alignment of CaMs revealed the presence of four D-x-D motifs in four EF-hand domains (Fig. 3, Additional file 3: Figure S1). Each EF-hand domain contains one D-x-D motif and the motif was conserved at position 14th, and 16th in all of the EF-hands. In addition to the presence of a D-x-D motif in the EF-hands, the first EF-hand contains a conserved E-x2-E motif that conserved at 5th, and 8th position. Besides this it was found to contain a conserved E amino acid at position 25th of the 1st EF-hand (Fig. 3). The second EF-hand contained a conserved E amino acids at positions 5th and 12th, respectively; a conserved D-F-x-E-F domain at the position 22nd, 23rd, 25th and 26th, respectively and a conserved D amino acid at position 36th (Fig. 3). The third EF-hand contained conserved D and E amino acids at the 1st, and 8th position respectively, and conserved E amino acids at position 25th and 36th. The fourth EF-hand contained conserved E amino acids at the 4th, 5th, 12th and 25th position. A conserved E amino acid was found to present at 5th position in the first, second and fourth EF-hand. Similarly, a conserved E amino acid was also found to present at position 25th in all four EF-hands (Fig. 3). The first and fourth EF-hands contain no conserved amino acids at the 36th position, while the second and third EF-hands contained a conserved D and E amino acid respectively the 36th position respectively.
Unlike the CaMs, CMLs were also found to contain four calcium binding EF-hand domains (Fig. 4). Each EF-hand is around 36 amino acids long and contains a conserved aspartate (D) and glutamate (E) amino acid in the EF-hands. The EF-hand has a helix-loop-helix structure that coordinates the calcium ion. Multiple sequence alignment of CMLs shows the presence of a conserved D-x-D-x-D motif in the fourth EF-hand (Fig. 4, Additional file 4: Figure S2) that is conserved at 14th, 16th, and 18th position. No other EF-hands were found to contain conserved D-x-D motifs. Instead, they contain some other conserved amino acid at different positions. The first EF-hand contained conserved F-x2-F motif at the 5th and 8th position and a calcium binding D-x3-D motif at 9th and 13th position. Glycine (G) was found to conserve at position 14th and glutamate (E) was conserved at position 20th in the first EF-hand. Unlike the first EF-hand, the second EF-hand was also contained a conserved D-x3-D motif at the 13th and 17th position. Glycine was found to conserve at position 18th and E and F were conserved at position 24th and 25th, respectively, in the second EF-hand. In the third EF-hand, F was conserved at position 10th, while D and E were conserved at positions 14th and 25th respectively. In addition to the presence of a D-x-D-x-D motif in the fourth EF-hand, it was also found to contain a conserved F-x-E-F domain. The calcium sensor protein, calcium dependent protein kinase (CPK) contains a kinase domain and four calcium binding EF-hands. The EF-hand domain of CPK contains conserved D-x-D motifs in each EF-hand. The D-x-D motifs in the EF-hands of CPKs are conserved at positions 14th, 15th and 16th similar to the D-x-D motifs of CaMs. The D-x-D-x-D motif of CML was conserved at 14th, 16th, and 18th position of the EF-hand. The molecular structure of CML also revealed about the presence of only two calcium binding ligand pockets in the C-terminal region of the EF-hand (Fig. 5). These finding indicated that, the fourth EF-hand of CML present in the C-terminal region is more functional than the other three EF-hands. The two EF-hands of the N-terminal region and the first EF-hand of the C-terminal region (third EF-hand) don’t have any calcium binding ligand pockets. This may be the reason that CMLs might have undergone evolutionary changes to modified to CaMs and to add four calcium sensing D-x-D motif in it and hence they contain introns in the CaM gene. Although the D-x-D motifs were conserved at similar positions in the CaM and CPKs, only the CML contains the D-x-D-x-D motif in the fourth EF-hand while CaM contains the D-x-D motif in all four EF-hands. These findings show that the EF-hands present in CPKs are much similar to CaMs than that of CMLs. The presence of four EF-hand domains in CMLs, similar to that of CaMs and CPKs as well as the absence of a conserved D-x-D motif from all EF-hand domains of CML shows that they have developed recently and have yet to gain complete structural conservation unlike CaM and CPK.
CML contain signal sequences while CaM do not
Proteomes are larger and more dynamic than genomes because of the presence of abundant alternative splicing of genes and expanded functional and chemical complexities at the protein level due to post-translational modifications. This explains to some extent why larger genomes do not automatically translate into more complex organisms. Post-translational modifications lead to incorporation of new chemistry and molecular functions that cannot be precisely encoded by gene sequences. The posttranslational modifications event including myristoylation and palmitoylation have incredibly diverse biological functions in signaling, protein trafficking, localization, extracellular communication, protein regulation and metabolism. Co-translational and irreversible addition of myristic acid to N-terminal glycine residues are known as myristoylation. The N-terminal glycine residues that undergo protein myristoylation are usually conserved at the second position in the N-terminal region. Therefore, we analyzed about the presence of putative myristoylation and palmitoylation sites in CaMs and CMLs using CSS Palm software version 4.0. Our analysis revealed that the CaMs do not contain any palmitoylation or myristoylation sites. However, myristoylation sites were predicted in few CMLs (CML10, CML21, CML25, CML29, CML33, and CML34) (Additional file 4: Figure S2, highlighted in yellow). Approximately 63 (7.58%) of the 831 studied CMLs were found to contain glycine (G) amino acid residue at the second position of N-terminal end. The myristoylation motif found in CMLs were M-G-F, M-G-G and M-G-x (Fig. 6) where G amino acid was found at the 2nd position of the N-terminal end. The CPKs were also reported to contain conserved myristoylation motif including M-G-C, and M-G-N at the N-terminal end . Although the G amino acid was conserved at the second position in CML and CaM, the third position was dynamic. The palmitoylation and myristoylation events are sometimes correlated, and the absence of myristoylation may abolish the palmitoylation. When myristoylation of OsCPK2 was abolished by removing the N-terminal G amino acid, the protein could no longer be palmitoylated . These finding indicated that the myristoylation event is pre-requisite to palmitoylation. The absence of a myristoylation and palmitoylation site in CaM likely forced it to merged with the kinase domain resulting in evolution of CPK that contains palmitoylation and myristoylation site in the N-terminal region. Similarly, the presence of myristoylation sites in a few CMLs shows that the palmitoylation site has evolved recently in these proteins. Although the myristoylation has been shown to be pre-requisite to palmitoylation the same is not true for myristoylation. The myristoylation event might have occur independently without the requirement of a palmitoylation site. This is because neither CaM nor CML were found to contain any palmitoylation sites.
CMLs were evolved earlier than the CaMs
Phylogenetic trees were constructed to understand the evolution of CaMs and CMLs. The phylogenetic tree was constructed by taking the protein sequences of CaMs resulted in a single monophyletic clades with different groups and shows that they have evolved from a common ancestor of lower eukaryotic plants lineages (Fig. 7). We named them as group I (red), II (green), III (purple), IV (marron), V (olive), VI (silver), VII (blue), VIII (fuschia), IX (teal) and X (lime). The majorities of CaMs were clustered in group I (red), II (green), III (purple), VIII (fuschia) and IX (teal). The CaM group of Ostreococus, Micromonas, Selaginella, Volvox, Chlamydomonas and Picea forms the basal root of the phylogenetic tree. These finding show the that plant CaMs were evolved from their common ancestor of basal lower eukaryotic lineages. Construction of phylogenetic trees by taking protein sequences of CMLs revealed presence of eight monophyletic groups (Fig. 8). We named them as group I (red), II (purple), III (olive), IV (green), V (black), VI (fuschia), VII (blue) and VIII (lime) (Fig. 8). The CMLs of lower eukaryotes forms the basal root of the phylogenetic tree. These finding show that CMLs were also evolved from common ancestors of basal lower eukaryotic lineages. Both the results show that, CaMs and CMLs were evolved from their common ancestor. As both the CaMs and CMLs were evolved from a common ancestor and contain four calcium binding EF-hands, it was very important to determine if CaMs and CMLs were coevolved. Therefore, we took the protein sequences of CaMs and CMLs together and constructed a phylogenetic tree (Fig. 9). The phylogenetic tree revealed the presence of six monophyletic groups with CaMs and CMLs analysis (Fig. 9). We named them as group I (red), II (green), III (blue), IV (purple), V (lime) and VI (fuschia) (Fig. 9). The CaMs and CMLs of lower eukaryotic plants forms the basal root of the phylogenetic tree which reflects that both CaM and CMLs were evolved from basal lower eukaryotes together. The monophyletic clades of CaMs and CMLs were shared by each other. In the phylogenetic tree, part of the CML group is dominated (red) (Fig. 9). These finding indicate that these CMLs were evolved recently by duplication and got diversified. The CaMs and CMLs show that they have evolved together from their common ancestors and CMLs were found to be older than CaMs. This is why, during the evolution process, Eucalyptus grandis did able to acquire only one CaMs in its genome. The species tree of studied species shows that, the higher plants were evolved from their basal ancestors of lower eukaryotic lineage (Fig. 10). To understand the rate of evolution of CaM and CML, evolutionary rate was studied by estimating gamma parameters for site rates (ML). For CaMs, substitution pattern and rates were estimated under the Jones-Taylor-Thornton model (+G) . A discrete Gamma distribution was used to model evolutionary rate among differences sites (5 categories, [+G]). Mean evolutionary rates for CaM in these categories were 0.21, 0.50, 0.81, 1.24, 2.25 substitutions per site. The amino acid frequencies were 7.69% (A), 5.11% (R), 4.25% (N), 5.13% (D), 2.03% (C), 4.11% (Q), 6.18% (E), 7.47% (G), 2.30% (H), 5.26% (I), 9.11% (L), 5.95% (K), 2.34% (M), 4.05% (F), 5.05% (P), 6.82% (S), 5.85% (T), 1.43% (W), 3.23% (Y), and 6.64% (V). The maximum Log likelihood for this computation was −15907.460 and the analysis involved 262 amino acid sequences. There were a total of 139 positions in the final dataset. The mean evolutionary rates for CMLs were 0.18, 0.47, 0.79, 1.24, 2.32 substitutions per site. The amino acid frequencies were 7.69% (A), 5.11% (R), 4.25% (N), 5.13% (D), 2.03% (C), 4.11% (Q), 6.18% (E), 7.47% (G), 2.30% (H), 5.26% (I), 9.11% (L), 5.95% (K), 2.34% (M), 4.05% (F), 5.05% (P), 6.82% (S), 5.85% (T), 1.43% (W), 3.23% (Y), and 6.64% (V). The maximum Log likelihood for this computation was −63710.347. The analysis involved 824 amino acid sequences. There were a total of 116 positions in the final dataset. In both the cases, all positions with less than 95% site coverage were eliminated. That is, fewer than 5% alignment gaps, and missing data, and ambiguous bases were allowed at any position. Result shows that the substitution rates of CaMs are higher than those of CMLs.
CaM and CMLs are differentially expressed in different tissues
CaM and CMLs were reportedly involved in diverse cellular process including signaling and different biotic and abiotic stress responses. Different stress responses have varying effects on different parts of the plant. Therefore, tissue specific expressions of the genes also have a large impact in regulating stress conditions. The presence of tissue specific expression data in the phytozome database led us to analyze the expression data of CaMs and CMLs of G. max and P. vulgaris. The data revealed that the relative abundance of GmCaM5 and GmCaM8-2 were higher in all the studied tissues (pod, nodule, flower, stem, leaves and roots) compared to the rest of GmCaMs (Table 4). The maximum abundance of GmCaM5 was found to be 20.16 in nodules and 18.85 in stems respectibely. Similarly, the maximum abundance of GmCaM8-2 was found to be 19.62 in leaves and 80.77 in roots respectively followed by GmCaM9-4 which was also relatively highly expressed in roots (Table 4). The expression of GmCaM9-1 and GmCaM9-2 was not found in other tissues except the roots. The expressions of maximum PvulCaMs genes of P. vulgaris were observed in all tissues except the PvulCaM9-2. The abundance of PvulCaM5 was higher than others in all the tissues. The PvulCaM7-2 and PvulCaM7-3 were highly expressed in all tissues, but to a lesser extent than the PvulCaM5 (Table 4). Although other PvulCaMs were expressed in all tissues, their expression level was comparatively lower than that of PvulCaM5, PvulCaM7-2, and PvulCaM7-3. Overall the result shows that, CaM5 was highly expressed in all tissues in G. max and P. vulgaris, while GmCaM8-2 was highly expressed in roots. Similarly, the PvulCaM7-2 and PvulCaM7-3 were ubiquitously expressed in all tissues.
Compared to the CaM, CMLs were expressed relatively at lower levels in different tissues (Table 5). The GmCML20 was found to be highly expressed in nodules (51.11) and flowers (54.53), while GmCML3-3 was highly expressed in pods (34.97), nodules (40.57), stems (157.21), and roots (124.09) (Table 5). The GmCML5-3 was highly expressed in stems and flowers while GmCML3-4 and GmCML3-5 were not expressed in pods, nodules, flowers, or leaves. Similarly, GmCML15-2 and GmCML15-3 were not expressed in pods, nodules, flowers, stems, leaves or roots (Table 5). The GmCML15-1 was found to be slightly expressed in nodules, flowers, stems, leaves and roots whereas GmCML16 was not expressed in leaves, while it was slightly expressed in other tissues. Additionally, GmCML25 was also found to be not expressed in flowers while slightly expressed in other tissues. Similarly, the GmCML20 was not expressed in pods and slightly expressed in other tissues.
When compared to PvulCaMs, PvulCMLs were also expressed at relatively lower levels. The PvulCML3-3 was ubiquitously expressed in pods (31.86), nodules (39.28), flowers (53.71), stems (87.46), leaves (8), and roots (73.65) (Table 5) while PvulCML3-2 was found to be expressed significantly higher in nodules (72.86), flowers (30.79), stems (36.36), leaves (7.38) and roots (38.76), but expressed to a lesser extent than that of PvulCML3-1. The PvulCML38-3 was highly expressed in pods (74.9) and roots (38.55) followed by expression of PvulCML25-3 in pods (34.2), flowers (96.27), stems (38.21), and roots (38.55) (Table 5). The PvulCML20 was highly expressed in pods (33.01), nodules (35.89), flowers (73.85), stems (48.62), leaves (27.89) and roots (37.04) while PvulCML3-4 was not expressed in pods, nodules, stems, leaves and roots but it was relatively highly expressed in flowers (3.58) (Table 5). Similarly, PvulCML15 was not expressed in pods, nodules, stems, leaves and roots while relatively highly expressed in flowers (4.71). Investigations of the expression of G. max and P. vulgaris CMLs revealed that, CML3 and CML 20 were expressed in all tissues in both the plants, while CML3-4, and CML15 (CML15-2 and CML15-3 in the case of G. max) were not expressed in any of the plants.
The CaM and CML gene family from 41 plant species were studied. Study shows the presence of four calcium binding D-x-D motifs in CaM and one D-x-D-x-D motif in CMLs. The number of family members of CaM and CMLs gene family vary significantly and do not correlate to the genome size of the organism. The evolutionary study shows, CMLs were evolved earlier than CaMs and diversified later. Tissue specific expression of CaM and CML shows, these genes plays important role in development of different tissues in G. max and P. vulgaris.
Identification of CaM and CML gene family
The calmodulin and calmodulin-like genes of Arabidopsis thaliana and Oryza sativa were downloaded from the “Arabidopsis Information Resource” database  and “Rice Genome Annotation Project” respectively . The protein sequences of CaM and CMLs of A. thaliana and O. sativa were used as the query sequences in the publicly available phytozome databases to identify the protein sequences of CaM and CMLs of other plant species using BLASTP program . The CaM and CML genes of Picea abies were downloaded from the spruce genome project . The protein sequences of CaM and CML were used to identify the CaM and CML gene family in other plant species. Overall, 41 plant species were considered during the study (Table 1). The statistical parameters used during BLASTP searches were target type, proteome; expect (E) threshold, (−1); and comparison matrix, BLOSUM62. Sequences recovered from the BLASTP searches were collected for further analysis. Later, all the collected sequences of BLAST results were evaluated using the scanprosite software to confirm the presence of the prosite calcium binding EF-hands domain. The sequences those showed the presence of four calcium binding EF-hands domains were considered as CaM or CML proteins. Later, all sequences were subjected to the BLASTP analysis in the A. thaliana (TAIR) and O. sativa proteome (rice genome annotation project) database. The sequences that resulted in BLASTP hits of the CaM gene in both the database were considered as CaM protein while that resulted in BLASTP hits to the CMLs were considered as CML proteins.
Subsequently we named all the CaM and CML proteins of the studied plant species. Nomenclature was conducted according to the orthologous based nomenclature system as proposed earlier [8, 25]. Name were given by taking the first letter of the genus name in upper case and the first letter of the species (sometimes 2 to3 letters were used when redundancy was observed) name in the lower case followed by the number corresponding to the orthologs genes of A. thaliana or O. sativa. Monocot plant species were named according to the orthologous genes of O. sativa while dicot and other species were named according to the orthologous genes of A. thaliana as proposed earlier [8, 25, 26].
Molecular modeling of CaM and CML
Molecular modeling was conducted to evaluate the molecular details of CaM and CML proteins. The Geno3d software  was used to construct the molecular structure of CaM and CMLs. The protein sequence of AtCaM1 and AtCML1 was utilized as the query sequence to search the model. Following statistical parameters were used to run the analysis: database, non-redundant protein sequences; filter query sequence (−F), true; expectation value (−e, real), 10.0; number of on-line descriptions (−v, int), 500; number of alignments to show (−b, int), 500; matrix (−M), BLOSUM62; expectation value threshold for inclusion in multipass model (−h, real), 0.002; maximum number of passes to use in multipass version (−j, int), 3.
Multiple sequence alignment
Multiple sequence alignment of CaM and CML proteins was conducted separately to investigate the presence of conserved domains and motifs. Multalin software was used to run the multiple sequence alignment. The statistical parameters used during multiple sequence alignments were, sequence input format, Multalin-fasta; protein weight matrix, BLSOUM62-12-2; gap penalty at opening, default; gap penalty at extension, default; gap penalties at extremities, none; one iteration only, no; high consensus level, 90%; low consensus level 50%.
Palmitoylation site prediction
The palmitoylation sites of CaMs and CMLs protein were predicted using the CSS palm software version 2.0 . During the prediction, input sequences were submitted in FASTA format and the threshold was set to higher or medium.
The phylogenetic trees were constructed to understand the evolution of CaM and CMLs. To construct the phylogenetic tree of CaMs and CMLs, protein sequences were subjected to clustalW or clustal omega software to generate a clustal file . The generated clustal files were then converted to MEGA file format using the MEGA6 software . The generated MEGA files of CaMs and CMLs were used to construct the phylogenetic trees. Different statistical parameters used to construct the phylogenetic tree were as follows: analysis, phylogeny reconstruction; statistical method, maximum likelihood; test of phylogeny, bootstrap method; no. of bootstrap replicates, 1000; substitution type, amino acid; model/method, Jones-Taylor-Thornton (JTT) model; gaps/missing data treatment, partial deletion; site coverage cutoff (%), 95; ML heuristic method, nearest-neighbor-interchange (NNI); and branch swap filter, very strong. The gamma parameter for site rates was estimated using MEGA 6 software. Following parameters were used to study the site rate: analysis, estimate rate variation among sites (ML); statistical method, maximum likelihood; substitution type, amino acid; model/method, Jones-Taylor-Thornton (JTT); rates among sites, gamma distributed (G); number of discrete gamma categories, 5; gaps/missing data treatment, partial deletion; site coverage cutoff (%), very strong. The species tree was built using NCBI taxonomy browser (http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi).
Tissue specific expression of CaMs and CMLs
Understanding the tissue specific expression of any particular gene is important to elucidating its role in growth, development and stress responses. Therefore, we studied the tissue specific expression of CaM and CML genes of G. max and P. vulgaris. The expression profiles of CaMs and CMLs were searched in the phytomine database of phytozome. The expression profiles of all of the genes are represented as FPK (fragments per kilo base of exon per million reads mapped).
Regression analysis was conducted to evaluate the correlation of CaM and CML gene family size with regard to the genome size. Mathportal (http://www.mathportal.org/calculators/statistics-calculator/correlation-and-regression-calculator.php) was used for the correlation regression analyses.
Calcium dependent protein kinase
CDPK Related kinase
Clapham DE. Calcium signaling. Cell. 2007;131:1047–58.
Cameron AGW. Nuclear Reactions in Stars and Nucleogenesis. Publications of the Astronomical Society of the Pacific . 1957;69:201.
Kanchiswamy CN, Mohanta TK, Capuzzo A, Occhipinti A, Verrillo F, Maffei ME, et al. Differential expression of CPKs and cytosolic Ca2+ variation in resistant and susceptible apple cultivars (Malus x domestica) in response to the pathogen Erwinia amylovora and mechanical wounding. BMC Genomics. 2013;14:760.
Mohanta TK, Sinha AK. Role of Calcium-Dependent Protein Kinases during Abiotic Stress Tolerance. In: Tuteja N, Gill S, editors. Abiotic Stress Response Plants. 2015th ed. byWiley-VCH Verlag GmbH & Co.; 2015. p. 185–208.
Mohanta TK, Mohanta N, Mohanta YK, Parida P, Bae H. Genome-wide identification of Calcineurin B-Like (CBL) gene family of plants reveals novel conserved motifs and evolutionary aspects in calcium signaling events. BMC Plant Biol. 2015;15:189.
McCormack E, Braam J. Calmodulins and related potential calcium sensors of Arabidopsis. New Phytol. 2003;159:585–98.
McCormack E, Tsai YC, Braam J. Handling calcium signaling: arabidopsis CaMs and CMLs. Trends Plant Sci. 2005;10:383–9.
Mohanta TK, Mohanta N, Mohanta YK, Bae H. Genome-wide identification of calcium dependent protein kinase gene family in plant lineage shows presence of novel D-x-D and D-E-L motifs in EF-hand domain. Front Plant Sci. 2015;6:1146.
Mohanta TK, Mohanta N, Mohanta YK, Parida P, Bae H. Genome-wide identification of Calcineurin B-Like (CBL) gene family of plants reveals novel conserved motifs and evolutionary aspects in calcium signaling events. BMC Plant Bol. 2015;15:189.
Konstantinidis KT, Tiedje JM. Trends between gene content and genome size in prokaryotic species with larger genomes. Proc Natl Acad Sci U S A. 2004;101:3160–5.
Fischer I, Dainat J, Ranwez V, Glémin S, Dufayard J-F, Chantret N. Impact of recurrent gene duplication on adaptation of plant genomes. BMC Plant Biol. 2014;14:151.
Lespinet O, Wolf YI, Koonin EV, Aravind L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 2002;12:1048–59.
Guo YL. Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes. Plant J. 2013;73:941–51.
Mattick J. Introns: evolution and function. Curr Opin Genet Dev. 1994;4:823–31.
Rogozin IB, Carmel L, Csuros M, Koonin EV. Origin and evolution of spliceosomal introns. Biol Direct. 2012;7:11.
Carmel L, Wolf YI, Rogozin IB, Koonin EV. Three distinct modes of intron dynamics in the evolution of eukaryotes Three distinct modes of intron dynamics in the evolution of eukaryotes. 2007. p. 1034–44.
Lin H, Zhu W, Silva JC, Gu X, Buell CR. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol. 2006;7:R41.
Hartung F, Blattner FR, Puchta H. Intron gain and loss in the evolution of the conserved eukaryotic recombination machinery. Nucleic Acids Res. 2002;30:5175–81.
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalence of intron gain over intron los in the evolution of paralogous gene families. Nucleic Acids Res. 2004;32:3724–33.
Martín ML, Busconi L. A rice membrane-bound calcium-dependent protein kinase is activated in response to low temperature. Plant Physiol. 2001;125:1442–9.
Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2012;40:D1202–10.
Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 2007;35:D883–7.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, et al. The Norway spruce genome sequence and conifer genome evolution. Nature. 2013;497:579–84.
Mohanta TK, Arora PK, Mohanta N, Parida P, Bae H. Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants. BMC Genomics. 2015;16:58.
Hamel L-P, Nicole M-C, Sritubtim S, Morency M-J, Ellis M, Ehlting J, et al. Ancient signals: comparative genomics of plant MAPK and MAPKK gene families. Trends Plant Sci. 2006;11:192–8.
Combet C, Jambon M, Del G, Geourjon C. Geno3D : automatic comparative molecular. Bioinformatics. 2002;18:213–4.
Ren J, Wen L, Gao X, Jin C, Xue Y, Yao X. CSS-Palm 2.0: an updated software for palmitoylation sites prediction. Protein Eng Des Sel. 2008;21:639–44.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.
This work was carried out with the support of the Next-Generation Biogreen 21 Program (PJ011113), Rural Development Administration, Korea.
This work was carried out with the support of the Next-Generation Biogreen 21 Program (PJ011113), Rural Development Administration, Korea. The funding agency has no roles in the design of the study and collection, analysis, and interpretation of data.
Availability of data and materials
All data analyzed during this study were taken from publicly available phytozome database and also provided as Additional files.
TKM Conceived the idea, design the experiment, analyzed data and drafted the manuscript; PK and HB Revised the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Calmodulin (CaM) gene family members of monocot, dicot and lower eukaryotic plant lineages. Table shows gene name, locus ID, open reading frame (ORF), number of introns and 5'-3' coordinates of CaM genes. (DOC 428 kb)
Calmodulin-like (CML) gene family members of monocot, dicot and lower eukaryotic plant lineages. Table shows gene name, locus ID, open reading frame (ORF), number of introns and 5'-3' coordinates of CML genes. (DOC 1223 kb)
Multiple sequence alignment of CaM protein of plant. Alignment shows presence of conserved motifs in EF-hand domain. (PDF 817 kb)
Multiple sequence alignment of CML protein of plant. Alignment shows presence of conserved motifs in EF-hand domain. (PDF 441 kb)