Genome-wide analysis of the cellulose synthase-like (Csl) gene family in bread wheat (Triticum aestivum L.)
BMC Plant Biology volume 17, Article number: 193 (2017)
Hemicelluloses are a diverse group of complex, non-cellulosic polysaccharides, which constitute approximately one-third of the plant cell wall and find use as dietary fibres, food additives and raw materials for biofuels. Genes involved in hemicellulose synthesis have not been extensively studied in small grain cereals.
In efforts to isolate the sequences for the cellulose synthase-like (Csl) gene family from wheat, we identified 108 genes (hereafter referred to as TaCsl). Each gene was represented by two to three homeoalleles, which are named as TaCslXY_ZA, TaCslXY_ZB, or TaCslXY_ZD, where X denotes the Csl subfamily, Y the gene number and Z the wheat chromosome where it is located. A quarter of these genes were predicted to have 2 to 3 splice variants, resulting in a total of 137 putative translated products. Approximately 45% of TaCsl genes were located on chromosomes 2 and 3. Sequences from the subfamilies C and D were interspersed between the dicots and grasses but those from subfamily A clustered within each group of plants. Proximity of the dicot-specific subfamilies B and G, to the grass-specific subfamilies H and J, respectively, points to their common origin. In silico expression analysis in different tissues revealed that most of the genes were expressed ubiquitously and some were tissue-specific. More than half of the genes had introns in phase 0, one-third in phase 2, and a few in phase 1.
Detailed characterization of the wheat Csl genes has enhanced the understanding of their structural, functional, and evolutionary features. This information will be helpful in designing experiments for genetic manipulation of hemicellulose synthesis with the goal of developing improved cultivars for biofuel production and increased tolerance against various stresses.
Plant cell wall consists of three main polysaccharide fractions: cellulose, hemicellulose, and pectin, with lignin and proteins being the other two constituents. Grass walls contain mainly two of the three polysaccharide fractions with pectin being a rather minor constituent. Hemicelluloses are plant cell wall matrix polysaccharides that possess diverse linear or branched structures [1, 2]. These mainly encompass 1–4-β-glucan, 1,3;1,4-β-glucan, galactan, and glucomannan in grasses . In addition, glucuronoarabinoxylan is a major grass wall constituent. Because of the presence of heterogeneous substituents or other linkages on their polymer backbones, hemicelluloses are non-crystalline and can be readily hydrolysed in comparison to cellulose. These polysaccharides can interact with cellulose microfibrils through hydrogen bonds .
Hemicellulosic polysaccharide backbones in plants are made by the cellulose synthase-like (Csl) enzymes, which are members of a much larger superfamily of genes referred to as glycosyltransferase 2 (GT2) . Several other GTs, i.e., xyloglucan α-1,6-xylosyltransferases (GT34), xyloglucan fucosyltransferases (GT37), and xyloglucan galactosyltransferases (GT47) have been reported to be involved in the biosynthesis of xyloglucans . Genes encoding Csl enzymes share sequence similarity with the cellulose synthase A (CesA) gene family known to form cellulose throughout the plant kingdom . A variable number of Csl genes ranging from 30 to 50 have been reported from different plant species and are classified into nine subfamilies (CslA–CslH and CslJ) [8, 9]. Cereals generally lack CslB and CslG families. Among the remaining families, CslA, CslC, and CslD are conserved in all land plants, whereas CslF, CslH are restricted to grasses [10, 11]. A poorly understood subfamily, CslJ, has been reported in grasses as well as dicots, which contrasts with the previous claims of its occurrence only in grasses [12, 13]. Similarly, the subfamilies CslB and CslG were previously reported to be specific to dicots . However, a recent report established the presence of the CslB subfamily in monocots as well . Several of the Csl subfamilies have been reported to be involved in the biosynthesis of different cell wall polysaccharides. For example, subfamily CslA was shown to form β-1,4-mannan backbone of galactomannan and glucomannan [15, 16]. Similarly, CslF and CslH subfamilies were shown to make 1–3;1–4-β-glucan in grasses [17, 18], whereas CslC genes were associated with the formation of the 1–4-β-glucan backbone of a xyloglucan and some other polysaccharides .
Wheat is a major cereal crop grown on the largest area of arable land in the world, is second only to maize in grain production, and feeds approximately 40% of the world population . It has a large genome size (~17 Gb), of which ~80–90% is repetitive . Even after the complete genome sequence became available , Csl genes remain unidentified and uncharacterized in bread wheat. In general, homeologous copies of most of the genes are located on each of the three chromosomes belonging to each of the subgenomes (A, B, and D), suggesting that the number of Csl genes is expected to approximately three-times that of a diploid species like rice. We used publicly available resources to retrieve wheat genome sequence. Large-scale data mining was performed using the Pfam domain models for the identification of Csl gene family members, which are reported in this study.
Data sources and sequence retrieval
Wheat genome data were downloaded from the Ensembl Plants FTP server (ftp://ftp.ensemblgenomes.org/pub/current/plants/fasta/triticum_aestivum/), generated by the International Wheat Genome Sequencing Consortium (IWGSC) and converted into a local BLAST database using the UNIX pipeline. BLAST analyses (BLASTN as well as BLASTP) were performed using the stand-alone command line version of NCBI (National Center for Biotechnology Information) blast 2.2.28+ (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/), released March 19, 2013. A query file was generated from Pfam domain models; PF00535 (GT2) domain and PF03552 (Cellulose_synt) downloaded from Pfam 30.0 June 2016 release . The sequences of splice variants were also retrieved from Ensembl Plants browser (http://plants.ensembl.org/Triticum_aestivum/Info/Index). Analysis of splice variants was conducted as described by Kim et al. (2007) . Previously known Csl sequences from Arabidopsis thaliana, Oryza sativa, and Zea mays were downloaded from the Cell Wall Navigator database . For Brachypodium, sequences were retrieved from phytomine (https://phytozome.jgi.doe.gov). Amino acid sequences of the aforementioned CSL proteins are given in Additional file 1: Figure S1.
Blast searches for wheat homologs
All query files containing the two Pfam domain models (PF00535 and PF03552) were used to perform the BLASTn searches against the local blast database of bread wheat. All blast hits with E-value >1.0 were removed. Using cut-off E- value <1.0, all previously known CesA genes were retrieved. After the compilation of all the sequences below the cut-off value, CD-hit program was used to obtain non-redundant sequences. Higher cut-off E- value was used to ascertain the identification of all the genes that possessed the Pfam domains PF00535 and PF03552. These genes were further filtered through phylogenetic analysis alongwith previously known CSL proteins from Arabidopsis, Brachypodium, maize, and rice, which reflected some non-targeted genes that were removed from further analysis . Phylogenetic analysis was also implemented to categorize different Csl sub-families. CesA genes were distinguished from the Csl genes with the CXC motif, which is diagnostic of the CesA but absent from the Csl proteins [7, 27]. Presence of the conserved domains Cellulose_synt/GT2 was confirmed using a batch blast search at the CDD (conserved domain database) of NCBI. Homeologous genes from each of the three genomes were named TaCslXY_ZA, TaCslXY_ZB, or TaCslXY_ZD, where X denotes the Csl subfamily, Y the gene number and Z the wheat chromosome where it is located. Alignment of the sequences of all newly identified wheat Csl genes is given in Additional file 2: Figure S2.
Protein structure and motif/domain identification
Protein sequences were downloaded from the Ensembl Plants FTP server (ftp://ftp.ensemblgenomes.org/pub/current/plants/fasta/triticum_aestivum/), developed by the International Wheat Genome Sequencing Consortium (IWGSC) . Multiple protein sequence alignments were performed using Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) . The resulting alignments were analysed for the presence of conserved motifs (D, D, DXD, QXXRW) of the GT2 superfamily. Conserved patterns of aligned sequences were highlighted using the sequence manipulation suite: Color align conservation (http://www.bioinformatics.org/sms2/color_align_cons.html) . The conserved domains were predicted using CCD database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 30, 31]. Wheat Csl genes were named based on their sequence identity, coverage, presence of conserved domains and motifs similar to those of the previously identified rice Csl genes. The number of genes in in a subfamily exceeded that of rice, the additional genes were given new names. Because of the resemblance of CslD genes with CesA genes and their probable role in cellulose synthesis, we specifically focused on the TaCslD subfamily. Gene structures and intron evolution of TaCslD members were predicted using the gene structure display server 2.0 (http://gsds.cbi.pku.edu.cn/) using the genomic and cDNA sequences.
Evolutionary relationships of Csl genes
A total of 215 CSL proteins from Arabidopsis, maize, rice and wheat were aligned using MAAFT (v1.3.6) . Sequences that did not extend over the conserved core region were removed. Positions where more than 40% of the sequences contained a gap were also removed. The phylogeny and 1000 bootstrap replications of these sequences was inferred using Seqboot (v3.696)  and FastTree (v2.1.10) implemented on the Guillimin cluster .
The phylogeny of the CslD subfamily was also determined separately from Arabidopsis, Brachypodium, maize, rice and wheat. For phylogenetic analysis, the amino acid sequences of CSL proteins were aligned using MUSCLE and their evolutionary history was inferred using Neighbor-Joining methods . The tree was drawn to scale, with branch lengths being equivalent to the evolutionary distances used to infer the phylogenetic tree. Evolutionary distances were computed with a Poisson correction and are given as the number of amino acid substitutions per site. The rate of variation among sites was modeled with a gamma distribution (shape parameter = 1) and all positions containing gaps and missing data were removed. Evolutionary analyses were conducted in MEGA6 .
RNA-seq expression analysis
Publicly available RNA-seq data generated from bread wheat (var. Chinese Spring) was used to study the expression of newly identified wheat Csl genes. The data were compiled from five different wheat tissues (spike, leaf, stem, root, and grain) collected at seedling, vegetative and reproductive stages of development . The relative expression of each TaCsl subfamily was presented as a heat map generated from the relative abudnace of transcripts (per 10 million reads) for each gene using wheat expression browser powered by expVIP (http://www.wheat-expression.com).
Identification and classification of Csl gene family members in bread wheat
Database searches for bread wheat using conserved pfam motifs PF00535 and PF03552, which are specific to the GT2 superfamily, resulted in the identification of 108 cellulose synthase-like (TaCsl) genes (Table 1). Two to three homeologous copies of each gene from the A, B and D genomes were common. The identified genes were named following the nomenclature of rice, which shares synteny with wheat. To avoid the complexity of the nomenclature, a suffix corresponding to the chromosome number and the specific wheat genome identifier (A, B, or D) has been used for each gene name . For example, the first gene of subfamily CslA; CslA1 on the long arm of chromosome 1 of genomes A, B, and D is named as TaCslA1_1AL, TaCslA1_1BL, and TaCslA1_1DL, respectively.
An unrooted neighbor-joining (NJ) tree for the 215 derived Csl proteins from Arabidopsis, maize, rice and wheat is shown in Fig. 1. TaCsl proteins grouped into seven subfamilies: TaCslA (32 proteins), TaCslC (13 proteins), TaCslD (12 proteins), TaCslE (10 proteins), TaCslF (29 proteins), TaCslH (8 proteins), and TaCslJ (4 proteins) (Fig. 2). The TaCslA and TaCslC subfamilies were closely related as shown by their taxonomic distribution and phylogenies. As expected, these subfamilies were conserved across the plant species. Although TaCslD is present in all the plant species whereas TaCslF is specific to grasses, their proximity to each other suggests a common origin . Among the sequences common to both dicots and grasses, subfamily CslA appeared to be the most divergent between these two groups of plants. Whereas the sequences within the subfamilies CslC and CslD were interspersed between Arabidopsis and grasses, all the subfamily CslA sequences of Arabidopsis clustered together, separately from the grass CslA sequences. Proximity of the CslB and CslH subfamilies points to their common origin before the separation of grasses from dicots. Similarly, CslG and CslJ apparently had a common origin.
Splice variants of Csl genes
Twenty two of the 108 genes appeared to encode two or more proteins because of the presence of alternative splicing sites, as predicted by Ensembl database, which would result in 137 probable Csl protein products (Table 2). Splice variants were predicted in all the subfamilies of the TaCsl genes except TaCslD (Table 2). In the subfamily TaCslA, 6 genes alternatively spliced to form 13 putative proteins whereas in the subfamily TaCslC, 5 genes were alternatively spliced resulting in 14 putative proteins. Similarly, for the subfamilies TaCslE and TaCslF, alternative splicing resulted in 7 and 10 splice variants, respectively. Alternative splicing of 1 and 2 genes respectively generated 3 and 4 putative proteins in the CslH and CslJ subfamilies (Fig. 2). More than half (51%) of the splice variants stemmed from exon skipping, ~24% from alternative 5′ and 3′ splice sites, and the rest, ~24%, from intron retention (Table 2).
Conserved motifs and domains
All predicted TaCSL proteins contain either the pfam glycosyltransferase family 2_3 (GT) domain (PF13641) or the cellulose_synt domain (PF03552), considered to be the signature domains of the GT2 superfamily [12, 26]. Subfamilies TaCslA and TaCslC contained GT 2_3, and CslD, CslE, CslF, CslH,and CslJ contained the cellulose_synt domain (Fig. 2). All the TaCsl translanted products contained the motifs D, DXD, D and QXXRW except eight truncated genes that lacked some of these motifs apparently because of the missing sequence (TaCslA7_2DS, TaCslD4_1BS, TaCslD4_5BS, TaCslF2_7BL, TaCslF6_7AL, TaCslF6_7DL, TaCslH3_3AS, TaCslH2_3B). Rice CesA10, 11 and CslH3 also contained only the DXD but lacked the D and QXXRW motifs . The variable amino acids in the conserved motifs DXD and QXXRW were diverse in different subfamilies of Csl genes, for example, for TaCslA (DMD, QQH/FRW); TaCslC (DMD, QQHRW); TaCslD (DCD, QVLRW); TaCslE (DCD, QHKRW); TaCslF (DC/GD, QI/VL/VRW); TaCslH (DCD QF/YKRW); TaCslJ (DCD, QNKRW). These motifs are highlighted in alignment files in the text file S_2a-f.
Phylogenetic analysis of the CslD subfamily
The evolutionary history of the CslD subfamily from Arabidopsis, Brachypodium, rice, maize and wheat was inferred using the Neighbor-Joining method, in MEGA6 , after grouping the orthologs from various species into different clades (Fig. 3). Rice Csl genes were used as reference because their complete nomenclature is well documented. All the genes grouped into three clades. The first clade contained CslD2 and CslD1 genes from rice and their orthologs from the remaining species. The three homeologous genes of wheat branched together with OsCslD1; wheat genes under this clade were named TaCslD1_1AL, TaCslD1_1BL, and TaCslD1_1DL. The second clade contained two subgroups with the orthologs of rice genes CslD3 and CslD5 from different species. The genes in the first subgroup were named TaCslD3_2AS, TaCslD3_2BS, and TaCslD3_2DS, and those of the second subgroup TaCslD5_7AL, TaCslD5_7BL, and TaCslD5_7DL. The last clade was composed of the orthologs of the rice CslD4 and wheat genes TaCslD4_5BS, TaCslD4_1BS and TaCslD4_5DS. Here we found only two homeologs of TaCslD4, but a gene from the 1BS genome (TaCslD4_1BS) of wheat grouped together with TaCslD4 genes (bootstrap = 1000), pointing to a translocation from its original A genome (Table 1). This gene shared sequence identity of 85% with TaCslD4_5BS at the amino acid level. OsCslD genes shared 73–86% sequence identity with the corresponding wheat orthologs.
Gene structure and intron evolution of TaCslD subfamily
The 12 TaCslD genes identified from bread wheat ranged in size from 1519 to 5864 bp. The TaCslD4_1BS gene was the shortest and TaCslD1_1AL was the longest. Homeologous copies of all the genes shared sequence identity ranging from 87 to 94% at the nucleotide level. The variation in size among different genes was primarily because of the number and length of introns but also because of a lack of the complete sequences in the database (Fig. 4). The number of introns in all the genes varied from 2 to 4. Two homeologs: TaCslD1_1AL and TaCslD1_1BL each contained three introns whereas, a third homeolog (TaCslD1_1DL) had four. The genes TaCslD3, TaCslD4 and their homeologs contained three introns each, except TaCslD4_1BS with only two introns. TaCslD5 and its homeologs also had two introns each. For the phases of introns, the genes from the TaCslD subfamily exhibited variable patterns of distribution. Introns 1, 2 and 3 of TaCslD1_1AL, TaCslD1_1BL and TaCslD1_1DL were in 2, 0, and 0 phase whereas the 4th intron of TaCslD1_1DL was in 0 phase. Introns 1 and 2 of TaCslD3_2AS, TaCslD3_2BS and TaCslD3_2DS both were in 0 phase. The third intron of these genes was in phase 2, 1 and 2 respectively. The genes TaCslD4_5BS, TaCslD4_5DS, TaCslD5_7AL, TaCslD5_7BL and TaCslD5_7DL had intron 1 and 2 in phases 2 and 0, respectively, and the third intron of TaCslD4_5BS and TaCslD4_5DS was in phase 0 and 2, respectively. TaCslD4_1BS had introns 1 and 2 in phases 1 and 0. The largest proportion of introns (60%) of all the genes was in phase 0, followed by phase 2 (34%) with a few in phase 1 (6%).
Expression analysis of TaCsl genes from bread wheat
Publicly available RNA-Seq datasets were used to analyse the expression of TaCsl genes over three developmental stages and different tissues of wheat including root, stem, leaf, spike, and grain. Expression data were available for 32 of the TaCslA genes. Two genes (TaCslA1_6AS and TaCslA1_6BS) were expressed in all the tissues except reproductive stem and leaves. Four genes (TaCslA5_2BS, TaCslA5_2DS, TaCslA6_3B, and TaCslA6_3AL) were expressed moderately. TaCslA9 gene was highly expessed in the leaf tissue from the reproductive stage while the transcript abundance of the remaining genes was low (Fig. 5). TaCslC subfamily genes, wtht the exception of TaCslC3, TaCslC9 and two homeologs of TaCslC10, were expressed highly in root and spike tissues. Two genes, TaCslC1 and TaCslC7 and their homeologs displayed moderate to high expression in all the tissues at seeding and vegetative stage. One gene (TaCslC10_5DL) exhibited moderate to high expression in all the tissues studied except reproductive stem and grain (Fig. 6). Expression of most of the genes of the TaCslD subfamily ranged from moderate to a high in the spike and root tissues but was very low in all the other tissues (Fig. 7). Three of the 10 TaCslE subfamily genes (TaCslE2_6AL, TaCslE2_6BL and TaCslE3) were expressed from moderate to high levels in all the tissues.The remaining genes were expressed at a very low level in all the tissues (Fig. 8). A mixed pattern of expression was observed in the large TaCslF subfamily. Three genes (TaCslF6_7AL, TaCslF6_7BL and TaCslF6_7DL) were highly expressed in all the tissues except the leaves at the reproductive stage. Two genes (TaCslF4_2BS and TaCslF4_2DS) were highly expressed in the stem tissue, but only at a low or moderate level in all other tissues. All other genes expressed at low or moderate levels in one or more tissues (Fig. 9). In the TaCslH subfamily, one of the eight genes, TaCslH1_2BL, was expressed from moderate to high levels in the leaf, stem and spike tissues. The remaining genes were expressed from low to moderate levels in all the tissues (Fig. 10). Three out of four members of the subfamily TaCslJ were expressed from low to moderate levels in the leaf and root tissues while one gene (TaCslJ1_3DS) was poorly expressed in all the tissues studied (Fig. 10).
Grass cell walls contain 20–40% non-cellulosic polysaccharides. The proportion and composition of these polysaccharides varies in different plant species . After the first report demonstrating the β-glucan synthase activity in a Csl-encoded protein was published , several members of the Csl gene family have been reported to be involved in the formation of the backbone of the hemicellulosic polysaccharides [16, 18, 19, 26, 38, 40, 41]. As information on the identify of the Csl genes in wheat was lacking, we undertook this study to fill this gap.
We retrieved 108 TaCsl genes from wheat using two conserved domains, PF00535, and PF03552, which were previously shown to be present in the derived proteins of all the Csl genes . These genes include homeologs from A, B and D genome of bread wheat. Similar patterns of homeologous genes were found for FLOWERING LOCUS T (FT), Pairing homeologous 1 (Ph1) and ADP-glucose pyrophosphorylase (AGPase) gene families of hexaploid wheat. Approximately, a quarter of the identified Csl genes were predicted to be alternatively spliced, possibly contributing to the diversity of encoded enzymes. A recent study suggested that alternative splicing was common in plants and accounted for about 20% of the loci transcribed in the leaf and spike tissues of Aegilops tauschii. In the case of germinating barley embryos, 14–20% of intron-containing genes were alternatively spliced . This phenomenon, apparently meant to increase the fitness of an organism, has not thus far been reported for the Csl genes from other species .
The TaCsl genes were distributed across all the wheat chromosomes except one, chromosome 4 (Fig. 11). A similar trend of Csl gene distribution was observed in barley [9, 44, 45]. More than half the TaCsl genes were located on only two chromosomes: 2 (32%) and 3 (22%). This suggests hyper-multiplication of the Csl genes on these chromosomes although the reasons for this phenomenon are unknown. It appears, though, that cis duplication of the Csl genes was favored over trans duplication in wheat. Five of the nine CslF genes in barley were located on chromosome 2H . In fact, the barley CslF gene was assigned its role in mixed-linked glucan (MLG) formation via syntenic orthology with rice long before the barely genome sequence became available  A detailed analysis of the rice syntenic region corresponding to a known QTL for MLG from barley, which had been published previously, initially led to the breakthrough of the role of CslF in the formation of this polysaccharide ). A similar cluster of CslF genes was also detected in the conserved syntenic regions of Brachypodium and sorghum on chromosomes 1and 2, respectively .
The observation that only half of genes from the subfamily CslA were expressed at varying levels in the studied tissues suggests that the apparently silent genes may provide a backup under stressful conditions. Alternatively, they may express only transiently in specialized cells or cell parts at levels too low to be detected by the method used to study expression. The first biochemical evidence for the relationship of CslA genes with mannan synthase activity came from the expression of a guar CSLA cDNA in soybean somatic embryos . Subsequent studies in insect cells demonstrated the role of CslA family members in the glucomannan synthases [16, 46]. Reverse genetic and biochemical approaches in Arabidopsis and Dendrobium officinale have also allowed association of certain CslA genes with glucomannan biosynthesis [41, 47]. A recent study in wheat suggested the involvement of a gene from the CslA subfamily in the development of tillers, cell wall composition and stem strength. This study further suggested the probable role of CslA gene transcript levels in carbon partitioning throughout the plant .
For the subfamilies TaCslC and TaCslD, most of the genes were relatively highly expressed in the root and spike tissues during the vegetative as well as reproductive phases. Heterologous expression in Pichia revealed that the CslC-encoded enzymes made β-1,4-glucan, the backbone of xyloglucan . The CslD subfamily is conserved in all land plants and is most closely related to the CesA gene family with 40–50% sequence similarity at the amino acid level . Similar to CesAs, the CslD subfamily is ubiquitous in all plant genomes examined to date, unlike other, taxa-specific Csl subfamilies . Previous reports also showed the involvement of certain members of the CslD subfamily in tip growth, for example development of root hairs and pollen tube elongation [51, 52], normal plant growth [50, 53], and meristem morphology [53, 54]. More recently, their role in resistance against biotic stresses has been described . Adding to this discussion, our in silico expression analysis suggests the involvement of certain TaCslD genes in spike development. This suggestion is supported by the observation that a mutant, slender leaf 1 (sle1), which encodes the CSLD4 protein in rice, reduces the number and width of spikelets in the panicle .
Two groups of Csl genes, CslF and CslH, have evolved independently in grasses . A third group CslJ, originally believed to be specific to grasses, was recently identified in some dicots [11, 13]. Although TaCslF6 gene showed higher expression in all the studied tissues except the leaf tissue from reproductive stage, it was the only member of the TaCslF subfamily which expressed highly in the grain tissue. Several studies have demonstrated the functional role of CslF6 and CslH in the synthesis of MLG [18, 44, 58, 59]. Only one member of all the genes in these families, CslF6, was expressed in the grain, suggesting that it was responsible for MLG formation. MLG is a desirable polysaccharide as a dietary fiber but undesirable for the brewery industry because it causes haze in beer. It should be possible to select natural variants for the expression of the CslF6 gene to select for an increased or reduced MLG content depending upon the target market for the grain.
Differential expression patterns were observed among homeologous copies from three different genomes of bread wheat, which agree with the previous studies reporting unequal contributions of the three genomes toward gene expression. Interestingly, the homeologous copies of TaCslD genes also differed from each other in terms of intron phase evolution, indicating structural and functional divergence of homeologous gene copies (Fig. 4). Most introns were present in phase 0, which is in accordance with previous findings showing an intron bias in favour of phase 0 [7, 60, 61]. The three homeologs of each gene were not observed for all the genes reported in this study. This could be because of the incomplete sequencing information or because of the elimination of the genes during the allopolyploidization of wheat.
We have identified 108 TaCsl genes in bread wheat and classified them into seven subfamilies (CslA, CslC, CslD, CslE, CslF, CslH, and CslJ). Two or three homeoalleles were identified for most of the Csl genes. Although located on all the wheat chromosomes except chromosome 4, the Csl genes were especially concentrated on chromosomes 2 and 3, suggesting selective, localized duplication in cis phase. Only one of the 29 CslF genes, CslF6, was expressed in the grain, suggesting its role in mixed-linked glucan formation. Neither CslJ nor CslH was expressed in the grain. Information in this report will be helpful in designing experiments to alter wall composition in wheat for improving grain quality, culm strength, or culm composition for biofuels.
Pauly M, Keegstra K. Cell-wall carbohydrates and their modification as a resource for biofuels. Plant J. 2008;54(4):559–68.
Sandhu APS, Randhawa GS, Dhugga KS. Plant cell wall matrix polysaccharide biosynthesis. Mol Plant. 2009;2(5):840–50.
Sorek N, Yeats TH, Szemenyei H, Youngs H, Somerville CR. The implications of Lignocellulosic biomass chemical composition for the production of advanced biofuels. Bioscience. 2014;64(3):192–201.
Pauly M, Gille S, Liu L, Mansoori N, de Souza A, Schultink A, Xiong G. Hemicellulose biosynthesis. Planta. 2013;238(4):627–42.
Richmond TA, Somerville CR. The cellulose synthase superfamily. Plant Physiol. 2000;124(2):495–8.
Rai KM, Thu SW, Balasubramanian VK, Cobos CJ, Disasa T, Mendu V. Identification, characterization, and expression analysis of Cell Wall related genes in Sorghum Bicolor (L.) Moench, a food, fodder, and biofuel crop. Front Plant Sci. 2016;1287.
Kaur S, Dhugga KS, Gill K, Singh J. Novel structural and functional motifs in cellulose synthase (CesA) genes of bread wheat (Triticum Aestivum, L.). PLoS One. 2016;11(1):e0147046.
Hazen SP, Scott-Craig JS, Walton JD. Cellulose synthase-like genes of rice. Plant Physiol. 2002;128(2):336–40.
Schwerdt JG, MacKenzie K, Wright F, Oehme D, Wagner JM, Harvey AJ, Shirley NJ, Burton RA, Schreiber M, Halpin C. Evolutionary dynamics of the cellulose synthase gene superfamily in grasses. Plant Physiol. 2015;168(3):968–83.
Burton RA, Collins HM, Kibble NA, Smith JA, Shirley NJ, Jobling SA, Henderson M, Singh RR, Pettolino F, Wilson SM, et al. Over-expression of specific HvCslF cellulose synthase-like genes in transgenic barley increases the levels of cell wall (1,3;1,4)-beta-d-glucans and alters their fine structure. Plant Biotechnol J. 2011;9(2):117–35.
Farrokhi N, Burton RA, Brownfield L, Hrmova M, Wilson SM, Bacic A, Fincher GB. Plant cell wall biosynthesis: genetic, biochemical and functional genomics approaches to the identification of key genes. Plant Biotechnol J. 2006;4(2):145–67.
Yin Y, Johns MA, Cao H, Rupani M. A survey of plant and algal genomes and transcriptomes reveals new insights into the evolution and function of the cellulose synthase superfamily. BMC Genomics. 2014;15(1):1.
Vogel JP, Garvin DF, Mockler TC, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas S, Harmon-Smith M, Lail K. Genome sequencing and analysis of the model grass Brachypodium Distachyon. Nature. 2010;463(7282):763–8.
Dhugga KS. Biosynthesis of non-cellulosic polysaccharides of plant cell walls. Phytochemistry. 2012;74:8–19.
Dhugga KS, Barreiro R, Whitten B, Stecca K, Hazebroek J, Randhawa GS, Dolan M, Kinney AJ, Tomes D, Nichols S. Guar seed ß-mannan synthase is a member of the cellulose synthase super gene family. Science. 2004;303(5656):363–6.
Liepman AH, Wilkerson CG, Keegstra K. Expression of cellulose synthase-like (Csl) genes in insect cells reveals that CslA family members encode mannan synthases. Proc Natl Acad Sci U S A. 2005;102(6):2221–6.
Burton RA, Wilson SM, Hrmova M, Harvey AJ, Shirley NJ, Medhurst A, Stone BA, Newbigin EJ, Bacic A, Fincher GB. Cellulose synthase-like CslF genes mediate the synthesis of cell wall (1, 3; 1, 4)-ß-D-glucans. Science. 2006;311(5769):1940–2.
Doblin MS, Pettolino FA, Wilson SM, Campbell R, Burton RA, Fincher GB, Newbigin E, Bacic A. A barley cellulose synthase-like CSLH gene mediates (1,3;1,4)-beta-D-glucan synthesis in transgenic Arabidopsis. Proc Natl Acad Sci U S A. 2009;106(14):5996–6001.
Cocuron JC, Lerouxel O, Drakakaki G, Alonso AP, Liepman AH, Keegstra K, Raikhel N, Wilkerson CG. A gene from the cellulose synthase-like C family encodes a beta-1,4 glucan synthase. Proc Natl Acad Sci U S A. 2007;104(20):8550–5.
Gupta PK, Mir RR, Mohan A, Kumar J. Wheat genomics: present status and future prospects. Int J Plant Genomics. 2008;2008:896451.
Mayer KF, Rogers J, Doležel J, Pozniak C, Eversole K, Feuillet C, Gill B, Friebe B, Lukaszewski AJ, Sourdille P. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum Aestivum) genome. Science. 2014;345(6194):1251788.
Consortium IWGS. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum Aestivum) genome. Science. 2014;345(6194):1251788.
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–85.
Kim E, Magen A, Ast G. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 2007;35(1):125–31.
Girke T, Lauricha J, Tran H, Keegstra K, Raikhel N. The cell wall navigator database. A systems-based approach to organism-unrestricted mining of protein families involved in cell wall metabolism. Plant Physiol. 2004;136(2):3003–8.
Yin Y, Huang J, Xu Y. The cellulose synthase superfamily in fully sequenced plants and algae. BMC Plant Biol. 2009;9:99.
Richmond T. Higher plant cellulose synthases. Genome Biol. 2000;1(4):REVIEWS3001.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol. 2011;7(1):539.
Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI. CDD: NCBI's conserved domain database. Nucleic Acids Res. 2014:gku1221.
Kaur R, Singh K, Singh J. A root-specific wall-associated kinase gene, HvWAK1, regulates root growth and is highly divergent in barley and other cereals. Funct Integr Genomics. 2013;13(2):167–77.
Katoh K, Misawa K, Ki K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
Felsenstein J. Phylogeny inference package (version 3.2). Cladistics. 1996;5:164–6.
Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490.
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–25.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9.
Choulet F, Alberti A, Theil S, Glover N, Barbe V, Daron J, Pingault L, Sourdille P, Couloux A, Paux E. Structural and functional partitioning of bread wheat chromosome 3B. Science. 2014;345(6194):1249721.
Wang L, Guo K, Li Y, Tu Y, Hu H, Wang B, Cui X, Peng L. Expression profiling and integrative analysis of the CESA/CSL superfamily in rice. BMC Plant Biol. 2010;10
Saxena IM, Brown R. Identification of a second cellulose synthase gene (acsAII) in Acetobacter xylinum. J Bacteriol. 1995;177(18):5276–83.
Burton RA, Wilson SM, Hrmova M, Harvey AJ, Shirley NJ, Medhurst A, Stone BA, Newbigin EJ, Bacic A, Fincher GB. Cellulose synthase-like CslF genes mediate the synthesis of cell wall (1,3;1,4)-beta-D-glucans. Science. 2006;311(5769):1940–2.
Goubet F, Barton CJ, Mortimer JC, Yu X, Zhang Z, Miles GP, Richens J, Liepman AH, Seffen K, Dupree P. Cell wall glucomannan in Arabidopsis is synthesised by CSLA glycosyltransferases, and influences the progression of embryogenesis. Plant J. 2009;60(3):527–38.
Zhang Q, Zhang X, Wang S, Tan C, Zhou G, Li C. Involvement of alternative splicing in barley seed germination. PLoS One. 2016;11(3):e0152824.
Zhou Y, Zhou C, Ye L, Dong J, Xu H, Cai L, Zhang L, Wei L. Database and analyses of known alternatively spliced genes in plants. Genomics. 2003;82(6):584–95.
Schreiber M, Wright F, MacKenzie K, Hedley PE, Schwerdt JG, Little A, Burton RA, Fincher GB, Marshall D, Waugh R. The barley genome sequence assembly reveals three additional members of the CslF (1, 3; 1, 4)-β-glucan synthase gene family. PLoS One. 2014;9(3):e90888.
Burton RA, Jobling SA, Harvey AJ, Shirley NJ, Mather DE, Bacic A, Fincher GB. The genetics and transcriptional profiles of the cellulose synthase-like HvCslF gene family in barley. Plant Physiol. 2008;146(4):1821–33.
Suzuki S, Li L, Sun Y-H, Chiang VL. The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus Trichocarpa. Plant Physiol. 2006;142(3):1233–45.
He C, Wu K, Zhang J, Liu X, Zeng S, Yu Z, Zhang X, da Silva JAT, Deng R, Tan J. Cytochemical localization of polysaccharides in Dendrobium Officinale and the involvement of DoCSLA6 in the synthesis of Mannan polysaccharides. Front Plant Sci. 2017;8:173.
Hyles J, Vautrin S, Pettolino F, MacMillan C, Stachurski Z, Breen J, Berges H, Wicker T, Spielmeyer W. Repeat-length variation in a wheat cellulose synthase-like gene is associated with altered tiller number and stem cell wall composition. J Exp Bot. 2017;68(7):1519–29.
Doblin MS, De Melis L, Newbigin E, Bacic A, Read SM. Pollen tubes of Nicotiana Alata express two genes from different β-glucan synthase families. Plant Physiol. 2001;125(4):2040–52.
Hunter CT, Kirienko DH, Sylvester AW, Peter GF, McCarty DR, Koch KE. Cellulose Synthase-like D1 is integral to normal cell division, expansion, and leaf development in maize. Plant Physiol. 2012;158(2):708–24.
Kim CM, Park SH, Je BI, Park SH, Park SJ, Piao HL, Eun MY, Dolan L, Han CD. OsCSLD1, a cellulose synthase-like D1 gene, is required for root hair morphogenesis in rice. Plant Physiol. 2007;143(3):1220–30.
Yuo T, Shiotani K, Shitsukawa N, Miyao A, Hirochika H, Ichii M, Taketa S. Root hairless 2 (rth2) mutant represents a loss-of-function allele of the cellulose synthase-like gene OsCSLD1 in rice (Oryza Sativa L.). Breed Sci. 2011;61(3):225–33.
Li M, Xiong G, Li R, Cui J, Tang D, Zhang B, Pauly M, Cheng Z, Zhou Y. Rice cellulose synthase-like D4 is essential for normal cell-wall biosynthesis and plant growth. Plant J. 2009;60(6):1055–69.
Bernal AJ, Jensen JK, Harholt J, Sørensen S, Moller I, Blaukopf C, Johansen B, De Lotto R, Pauly M, Scheller HV. Disruption of ATCSLD5 results in reduced growth, reduced xylan and homogalacturonan synthase activity and altered xylan occurrence in Arabidopsis. Plant J. 2007;52(5):791–802.
Douchkov D, Lueck S, Hensel G, Kumlehn J, Rajaraman J, Johrde A, Doblin MS, Beahan CT, Kopischke M, Fuchs R. The barley (Hordeum Vulgare) cellulose synthase-like D2 gene (HvCslD2) mediates penetration resistance to host-adapted and nonhost isolates of the powdery mildew fungus. New Phytol. 2016;212:421–33.
Yoshikawa T, Eiguchi M, Hibara K-I, Ito J-I, Nagato Y. Rice SLENDER LEAF 1 gene encodes cellulose synthase-like D4 and is specifically expressed in M-phase cells to regulate cell proliferation. J Exp Bot. 2013;64(7):2049–61.
Burton RA, Collins HM, Kibble NA, Smith JA, Shirley NJ, Jobling SA, Henderson M, Singh RR, Pettolino F, Wilson SM. Over-expression of specific HVCSLF cellulose synthase-like genes in transgenic barley increases the levels of cell wall (1, 3; 1, 4)-β-D-glucans and alters their fine structure. Plant Biotechnol J. 2011;9(2):117–35.
Taketa S, Yuo T, Tonooka T, Tsumuraya Y, Inagaki Y, Haruyama N, Larroque O, Jobling SA. Functional characterization of barley betaglucanless mutants demonstrates a unique role for CslF6 in (1,3;1,4)-beta-D-glucan biosynthesis. J Exp Bot. 2012;63(1):381–92.
Nemeth C, Freeman J, Jones HD, Sparks C, Pellny TK, Wilkinson MD, Dunwell J, Andersson AAM, Aman P, Guillon F, et al. Down-regulation of the CSLF6 gene results in decreased (1,3;1,4)-beta-D-Glucan in endosperm of wheat. Plant Physiol. 2010;152(3):1209–18.
Lynch M. Intron evolution as a population-genetic process. Proc Natl Acad Sci U S A. 2002;99(9):6118–23.
Bhattachan P, Dong B. Origin and evolutionary implications of introns from analysis of cellulose synthase gene. J Syst Evol. 2017;55(2):142–8.
This work was supported by the CGIAR’s Consortium Research Program WHEAT (KSD), Canada Foundation for Innovation (CFI), the ministère de l’Économie, de la science et de l’innovation du Québec (MESI) and the Fonds de recherche du Québec - Nature et technologies (FRQ-NT) (RB), and Natural Sciences and Engineering Research Council of Canada through discovery program (JS).
Computations were made on the supercomputer Guillimin from McGill University, managed by Calcul Québec and Compute Canada. The operation of this supercomputer is funded by the Canada Foundation for Innovation (CFI), the ministère de l’Économie, de la science et de l’innovation du Québec (MESI) and the Fonds de recherche du Québec - Nature et technologies (FRQ-NT).
Natural sciences and engineering research council of Canada.
Availability of data and materials
Yes, all the data are included in the supplement already.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
FASTA sequences of CSL proteins used for the phylogenetic analysis. (PDF 453 kb)
List of Csl subfamily genes, their protein sizes (number of amino acids), and multiple protein sequence alignments for different subfamilies. The conserved motifs (D, D, DXD, QXXRW) diagnostic of CSL proteins are highlighted with red boxes for each of the subfamilies. (PDF 465 kb)
About this article
Cite this article
Kaur, S., Dhugga, K.S., Beech, R. et al. Genome-wide analysis of the cellulose synthase-like (Csl) gene family in bread wheat (Triticum aestivum L.). BMC Plant Biol 17, 193 (2017). https://doi.org/10.1186/s12870-017-1142-z
- Cell wall
- Mixed-linked glucan