Comprehensive molecular evaluation of the histone methyltransferase gene family and their important roles in two-line hybrid wheat

Histone methylation usually plays important roles in plant development through post-translational regulation and may provide a new visual field for heterosis. The histone methyltransferase gene family has been identified in various plants, but its members and functions in hybrid wheat related in heterosis is poorly studied. In this study, 175 histone methyltransferase (HMT) genes were identified in wheat, including 152 histone lysine methyltransferase (HKMT) genes and 23 protein arginine methyltransferase (PRMT) genes. Gene structure analysis, physicochemical properties and subcellular localization predictions of the proteins, exhibited the adequate complexity of this gene family. As an allohexaploid species, the number of the genes (seven HKMTs orthologous groups and four PRMTs orthologous groups) in wheat were about three times than those in diploids and showed certain degrees of conservation, while only a small number of subfamilies such as ASH-like and Su-(var) subfamilies have expanded their members. Transcriptome analysis showed that HMT genes were mainly expressed in the reproductive organs. Expression analysis showed that some TaHMT genes with different trends in various hybrid combinations may be regulated by lncRNAs with similar expression trends. Pearson correlation analysis of the expression of TaHMT genes and two yield traits indicated that four DEGs may participate in the yield heterosis of two-line hybrid wheat. ChIP-qPCR results showed that the histone modifications (H3K4me3, H3K36me3 and H3K9ac) enriched in promoter regions of three TaCCA1 genes which are homologous to Arabidopsis heterosis-related CCA1/LHY genes. The higher expression levels of TaCCA1 in F1 than its parents are positive with these histone modifications. These results showed that histone modifications may play important roles in wheat heterosis. Our study identified characteristics of the histone methyltransferase gene family and enhances the understanding of the evolution and function of these members in allohexaploid wheat. The causes of heterosis of two-line hybrid wheat were partially explained from the perspective of histone modifications.

in Western Asia about 8000 years ago and consists of allohexaploid (AABBDD) in three sets of genomes: Triticum urartu (A-genome donor), an Aegilops speltoidesrelated grass (B-genome donor), and Aegilops tauschii (D-genome donor) [1,2]. The huge genome and complex structure of wheat kept its genetic information a mystery for a long time. After years of unremitting scientific effort, some high-quality genome annotation data have been released, and multiple genome assemblies are available, which provide a powerful basis for the study of functional genomics [3][4][5][6][7]. A large amount of genetic diversity of cultivated crop species reside in its wild relatives. To harness that diversity and to expand the genetic base of the cultivated species, it is important to understand the evolutionary relationships among them and to know which genomic and phenomic tools would be appropriate.
Epigenetics, including DNA methylation, chromatin modification, and RNA editing, is a branch of genetics that studies the heritable changes of gene expression without changing the nucleotide sequence of genes [8].
In eukaryotes, nucleosomes that contain four groups of proteins (H2A, H2B, H3, and H4) are the main structural components of chromatin. The four groups of proteins are two H3:H4 dimers and two H2A:H2B dimers, as well as a mixture of 146 bp DNA and a double helix. The N-end trailing of each histone conserved in vitro is the point of many targeted signal transduction pathways, resulting in post-translational modifications [9,10]. Histone methylation is one of most important epigenetic modifications, and it plays a fundamental role in a range of biological processes, from transcriptional regulation to heterochromatin formation [11]. Methylation sites are usually located on the lysine (K) and arginine (R) residues of H3 and H4 and thus determine the two methylation patterns of histone lysine (Lys, K) methylation (HKMT) and histone arginine (Arg, R) methylation (PRMT). Methylation occurred in the arginine and lysine residues of histones, including seven sites in arginine and 17 sites in lysine [11]. These complex modification processes, one earlier and one more advanced, are deeply associated with gene expression. Studies have shown that histone methylation, catalyzed by HKMT and PRMT proteins, plays an important role in plant developmental processes, such as floral organogenesis, seed development, and plant senescence and defense [12][13][14][15][16].
The two-line method is a preferred way of realizing the heterosis of wheat, but the results of research on the molecular mechanism of the heterosis remain relatively weak. Generally, there is a strong correlation between heterosis and the genetic distance of the parents [17], but the study has shown that genetic distance does not fully explain heterosis [18]. The allelic variation caused by epigenetic modification provides a novel way of explaining heterosis. Previous study has shown that histone modification is closely related to the differential expression of genes in hybrid maize [19]. The methylation of H3K4me2 can reduce the expression of two genes: CCA1 and LHY, which are the cores of heterosis in Arabidopsis, resulting in the improvement of photosynthetic efficiency of F 1 and the emergence of obvious hybrids [20]. Generally, H3K9ac and H3K4me3 occur in regions of normal chromatin and activate the expression of genes, whereas H3K9me2 and H3K27me3 tend to occur in the heterochromatin regions and usually inhibit gene expression [21,22]. The levels of H3K4me3 and H3K27me3 vary greatly among ecotypes and subspecies, but they showed additive patterns in Arabidopsis hybrids, whereas in rice they showed non-additive ones [19,23]. Proteins in hybrid F 1 , such as H3K9ac, H3K4me2, and H3K4me3, may be related to heterosis through the modification of different types of histone proteins to regulate circadian genes CCA1 and LHY [20]. However, due to the various methods of modification, the regulation mechanisms of histone modification and heterosis require further exploration. lncRNA-mediated deposition of H3K27me3 is critical for chromatin modification and transcriptional regulation [24,25]. lncRNAs participate in histone modification to regulate a series of important biological processes, such as vernalization and flowering, which can affect plant yield and quality [26,27]. MicroRNAs (miR-NAs) are a class of endogenous small noncoding RNAs which are involved in the regulation of gene expression at the posttranscriptional level by degrading their target mRNAs and/or inhibiting their translation. MicroRNAmediated regulation is a key component in a wide range of biological processes such as plant developmental plasticity, abiotic/biotic responses, and symbiotic/parasitic interactions [28,29]. HMT genes have been identified and studied in many plant systems and are widely involved in the regulation of plant growth and development [13,[30][31][32][33][34][35][36]. Although it is a fundamental process in an important cereal crop, histone methylation remains mysterious, and the role of regulatory factors such as miRNAs and lncRNAs in regulating histone methylation is largely unknown.
In this study, we provided a detailed overview of the phylogeny and expression of HMT genes in different hybrid bread wheats. We aim to explore the distribution and evolution of HMT members that catalyze histone methylation in wheat, and search for some specific HMTs which may participate in the construction of heterosis in hybrid wheat. All our efforts enhance the understanding of the evolution and function of the histone methylation family and sheds light on the histone methylation involved in regulating wheat yield.

Identification of TaHMT genes from the whole wheat genome
Hmmsearch and the BLASTP program obtained 152 HKMT (histone lysine methyltransferase) genes and 23 PRMT (histone arginine methyltransferase) genes, which were named TaHMT1-TaHMT152, coinciding with   TaHKMT1-TaHKMT152, TaHMT153-TaHMT175, and TaPRMT1-23, following the nomenclature 1A to 1D, 2A to 2D, up to 7A to 7D, based on their position on the chromosome. In total, we identified 175 genes that catalyze two kinds of histone methyltransferases (Fig. 1 distributed across all chromosomes and showed a trend of cluster distribution on chr2 and chr3. The small number of PRMT genes entails that they are orphans that exist on some chromosomes, such as chr4A ( Figure S1).
HMT gene family of wheat has expanded with more in-paralogs genes Sorghum bicolor, Hordeum vulgare, and Brachypodium distachyon are usually considered to be relatives of wheat due to the whole-genome duplication events that occurred at about 45-60 million years ago [37]. In this study, the HMT gene families in Arabidopsis thaliana, Oryza sativa, Hordeum vulgare, Sorghum bicolor, and Brachypodium distachyon were also comprehensively identified in Table 1 using the same method as that employed for wheat. According to previous results, the number of HMT genes family in diploid species is between 40 and 60. We found that there were 47 and 57 HMT genes in rice (Os) and Arabidopsis (At), respectively, which is close to the results previously obtained. In addition, 54, 56, and 49 HMT genes were obtained from the Sorghum bicolor (Sb), Brachypodium distachyon (Bd), and Hordeum vulgare (Hv) genome data, respectively. The proportion of gene quantity between wheat and other species was 3.07 (At), 3.72 (Os), 3.24 (Sb), 3.13 (Bd), and 3.57 (Hv), close to a ratio of 3:1, as expected (χ 2 test, P > 0.05).
To further explore the evolutionary relationship between wheat and the other five species, the orthologs and in-paralogs between each pair of these species were calculated. The number of orthologs in each species with one other genome is shown in Table 2. The result shows that wheat had a high average ortholog group size of 2.32, which also meant that every HMT gene had an average of 1.32 paralogs. The group sizes of other five species were significantly smaller than those of wheat, indicating that the genome ploidy of wheat enriches the number of paralogous genes for the HMT gene family. Significantly, the fact that wheat and Brachypodium distachyon clearly have more orthologous genes proves a later divergence between the two species, which is also a reflection of the previous conclusion that Brachypodium distachyon has a common ancestor with wheat.
According to the published wheat genome data, there are about in homologous groups of three, termed triads (1:1:1; 35.8% of genes) (IWGSC, 2018) [38]. Our results revealed a higher proportion 92.76% of HKMTs were in triads. In PRMTs, the ratio is 91.3% (Table 3). The high homology retention rate indicates that the HMT gene of three sets of wheat chromosome donors exhibited a certain conservation; in other words, there was almost no lack of homologous genes. This kind of conservative combination of triplets and high homology retention rate can also explain the richness of some gene families in wheat, such as the HMT genes. In addition to this 1:1:1 correspondence, some genes have expanded in breadth along the chromosomes. These members do not strictly follow the three homologies, like HKMT1 and PRMT1. In addition to a correspondence to genes on chromosomes with A/B/D homology, they also show homology with members of other chromosomes to a certain extent. The homology of these genes may have been established before the formation of wheat polyploidy.

Structure, phylogeny, and domain analysis of HMT genes in wheat
The analysis of the physical and chemical properties of the protein sequences encoded by 152 TaHKMT genes revealed that the amino acid content of different HKMT proteins varied significantly, and the molecular weight of the encoded protein was significantly different from 24.76 kD (TaHMT145) to 221.14 kD (TaHMT118). Compared to the HKMT gene family, the protein encoded by the PRMT gene family seems not to show a significant difference in physicochemical properties. In the HKMT protein, the differences in the physicochemical properties of the protein were significantly higher than those of the histone arginine methyltransferase. With the exception of TaHMT154, the isoelectric point (pI) of the proteins encoded by the histone arginine methyltransferase gene was less than 7.0. This shows a more consistent acidity that tends to play a role in the acidic subcellular environment, which may be due to the coding of weakly acidic protein by the PRMT genes. Only 5.26% (8 /152) of HKMT proteins showed high stability, while 34.78% (8/23) of PRMT members were proteins with high stability. These findings suggest that HKMT may have wider catalytic functions and may be involved in more biological processes. Interestingly, the subcellular localization prediction analysis told us that most HKMT proteins are located in the nucleus, while most PRMT proteins are located in the cytoplasm (Table S1). This difference in localization also implies that they may have different physiological functions. The structural analysis of all TaHMT genes indicated that all of the members contained exons, but the number of them was quite different. The TaHMT gene family has a large number of introns and exons, most of which have more than four exons, and their genetic structure is complex and diverse. Interestingly, the number of exons in the HKMT gene family presents a huge difference from 25 (TaHMT113) to 1 (TaHMT13.etc.) (Fig. 2, Table S1).
All 152 TaHKMT proteins were divided into seven classes according to the classification of the SET gene family in Arabidopsis [39]. The seven classes have fully different domain architectures and motif compositions (Figs. 2 and 3). Class I, namely, Enhancer of zeste [E(Z)] homologs (H3K27), has unique domains, such as SWI3, ADA2, N-CoR, and TFIIIB (SANT) DNA-binding domains and DNA binding protein CXC domain. These domains may facilitate the ability of the SET domain to modify histones. Class II consists of 16 SET proteins, which are homologous with Drosophila ortholog ASH1 and may lead to H3K4 and H3K36 modifications. Class III is believed to be the SET protein most responsible for the active mark H3K4me1/2/3, similar to the Drosophila ortholog TRX.
Class IV is the least of all of the classifications, which is highly conservative, and it is mainly characterized by a PHD finger and a C-terminally located SET domain. It is suggested that the members of this group may be involved in the methylation of H3K4. Class V is the largest population, consisting of 63 SET genes homologous to Drosophila SU(VAR)3-9, which is divided into two different subgroups. Subgroup I contains the WIYLD domain, pre-SET domains, SET domains, and post-SET domains, while subgroup II contains a typical SRA domain at the N-terminus, playing an essential role in the establishment of heterochromatic mark H3K9me1/2/3, as well as H4K20mes and H3K27me2. Classes VI and VII contain 44 members with interrupted SET domains and Rubis-subs-bind whose functions are to be determined.
The 23 TaPRMT proteins were divided into four classes, following previous opinion [40]. Classes I-1 to I-3 contain 20 members. Most of them include AdoMet-MTases with arginine methyl-transfer activity, which catalyzes the formation of mono-methylarginine and asymmetric dimethylarginine. TaPRMT18 to 20 belong to class II, which owns the core domain PRMT5, which catalyzes mono-methylarginine and symmetric dimethylarginine (Figs. 2C and 3B). The main reason that the HMT gene family has a large number of members is that the class I/II/V HKMT subfamily is expanding its members during evolution. The ratio of their number in wheat to that in rice and Arabidopsis is higher than the average of 3 (χ 2 test, P < 0.05) (Fig. 4, Table S3).

Expression characterization and transcription regulation analysis of TaHMTs
At present, the expression level of HMTs in various wheat tissues without any stress pressure is still unknown. In our study, the expression of TaHMT genes in the root, young leaf/shoot, and the young spikes during the vegetative growth stage were significantly higher than those of other tissues at other stages, and they may  TaHKM   A   T131   TaHKMT126   TaHKMT121   TaHKMT58   TaHKMT69   TaHKMT81   TaHKMT144   TaHKMT138   TaHKMT150   TaHKMT31   TaHKMT42   TaHKMT21   TaHKMT37   TaHKMT48   TaHKMT27   TaHKMT66   TaHKMT55   TaHKMT78   TaHKMT4   TaHKMT16   TaHKMT44   TaHKMT23   TaHKMT33   TaHKMT116   TaHKMT99   TaHKMT108   TaHKMT14   TaHKMT8   TaHKMT2   TaHKMT107   TaHKMT100   TaHKMT115   TaHKMT29   TaHKMT40   TaHKMT17   TaHKMT11   TaHKMT4   TaHKMT90   TaHKMT85   TaHKMT94   TaHKMT80   TaHKMT56   TaHKMT67   TaHKMT79   TaHKMT64   TaHKMT84   TaHKMT76   TaHKMT104   TaHKMT112   TaHKMT97   TaHKMT53   TaHKMT63   TaHKMT74   TaHKMT11   TaHKMT10   TaHKMT17   TaHKMT5   TaHKMT130   TaHKMT120   TaHKMT125   TaHKMT26   TaHKMT36   TaHKMT47   TaHKMT32   TaHKMT22   TaHKMT43   TaHKMT35   TaHKMT25   TaHKMT46   TaHKMT49   TaHKMT38   TaHKMT152   TaHKMT73   TaHKMT52   TaHKMT62   TaHKMT149   TaHKMT143   TaHKMT137   TaHKMT102   TaHKMT110   TaHKMT118   TaHKMT113   TaHKMT105   TaHKMT72   TaHKMT61   TaHKMT51   TaHKMT71   TaHKMT50   TaHKMT60   TaHKMT96   TaHKMT103   TaHKMT111   TaHKMT86   TaHKMT89   TaHKMT93   TaHKMT146   TaHKMT134   TaHKMT140   TaHKMT141   TaHKMT135   TaHKMT147   TaHKMT57   TaHKMT68   TaHKMT124   TaHKMT119   TaHKMT129   TaHKMT114   TaHKMT75   TaHKMT139   TaHKMT28   TaHKMT19   TaHKMT39   TaHKMT151   TaHKMT98   TaHKMT106   TaHKMT145   TaHKMT41   TaHKMT20   TaHKMT87   TaHKMT92   TaHKMT88   TaHKMT132   TaHKMT122   TaHKMT127   TaHKMT136   TaHKMT142   TaHKMT148   TaHKMT15   TaHKMT3   TaHKMT9   TaHKMT91   TaHKMT83   TaHKMT95   TaHKMT133   TaHKMT128   TaHKMT123   TaHKMT101   TaHKMT117   TaHKMT109   TaHKMT45   TaHKMT34   TaHKMT24   TaHKMT30   TaHKMT54   TaHKMT65   TaHKMT77   TaHKMT18   TaHKMT6   TaHKMT12   TaHKMT82 TaHKMT59 be widely involved in the physiological activities of wheat growth and development. Spike, as the reproductive organ, exhibits all-round expression for the genes during vegetative and reproductive stages. This result also indicates that the gene expression levels were significantly different in different organs and tissues, which may be biased. For example, TaHKMT6 and TaHKMT18 showed obvious tissue specificity, which are characterized by high expression only in the leaves/shoots during the seedling stage, and a relatively low relative expression in young spikes (Fig. 5). There are many inert genes whose expression level is always very low, such as TaHKMT15 and TaHKMT41. Cis-regulatory elements analysis showed that light responsiveness related elements were the most extensive-elements in the promoter regions of TaHMT genes. Except for TaHMT151, all genes contained light responsiveness related elements SP1 (GGG CGG ) or GT1 motif (GGT TAA ), of which 29 TaHMTs also contained circadian elements. Most TaHMT genes (173/175) contained stress resistance related elements, such as MBS motif related to drought-inducibility, ABRE motif to abscisic A B Fig. 3 The phylogeny of HKMT (A) and PRMT (B) proteins from wheat, rice, and Arabidopsis. Wheat proteins are colored red and are subclade-specific, whereas rice and Arabidopsis genes are in blue and green, respectively. The HKMT proteins were divided into seven subfamilies and PRMT were divided into four subfamilies Fig. 4 The number of HMT genes identified per subfamily in A Arabidopsis, B rice, and C wheat. D The ratio of total HMT gene numbers to those in all subfamilies is shown for wheat: rice (red) and wheat: Arabidopsis (purple). In D the expected ratio (3:1) is indicated by a black line, and asterisks mark a significant deviation from the expected value using χ 2 test, P < 0.05 acid responsive, LTR motif to low-temperature responsiveness ( Figure S2). MicroRNAs (miRNAs) are a class of endogenous small noncoding RNAs which are involved in the regulation of gene expression at the posttranscriptional level by degrading their target mRNAs and/ or inhibiting their translation. We found that 11 different miRNAs had 15 targeting sites in body parts of 11 TaHMT genes (Table S4, Figure S6). Some miRNAs can act on multiple TaHMT genes, and one TaHMT gene can also interact with multiple miRNAs. Two miRNAs were found to inhibit expression of the target genes at the translational level: tae-miR5049-3p inhibiting gene TaHMT40 and tae-miR1134 inhibiting gene TaHMT172.

Expression analysis of HMT genes between parents and hybrids
Based on our transcriptome sequencing results of JM8 (BS366 × TY806), we obtained four genes whose relative expression level was significantly different (|fold change|> 2, p < 0.05) from that of their parents in spikes (Fig. 6B). These parts are TaHMT39, TaHMT49, TaHMT149, and TaHMT152, which may be involved in heterosis during the period of wheat reproductive growth in spikes. However, there are no significant DEGs screened in the tillering tissue. We conducted a profound analysis of the transcriptome sequencing data and the expression data downloaded from wheat-expression.com. We defined the transcriptome sequencing data as shoot-TR and spike-TR, respectively, and the data obtained from the website database was defined as shoot-WE and spike-WE, representing two organizations in two periods. The two parts showed the matching degrees of 56.67% in shoot tissue, with 43.33% in spikes (Fig. 6A, Table S5). Our expectation of seven active genes at high levels of expression in the four groups was met. Finally, among them, 12 genes were screened in shoots, and 15 were screened in spikes for further analysis. Together with the 1000 seed weight (TSW) and effective tillers per plant in wheat, four TaHMT DEGs and 27 genes obtained by the previous methods were analyzed by qRT-PCR. WGCNA was performed on the expression results. The relative expression level of each maternal plant was standardized as relative 1, and all of the expression results were shown in Figure S3. Based on the trait data of TSW, the expression of 19 genes in spikes showed low correlations on the whole. The results were divided into six modules, in which three DEGs (green modules), TaHMT49, TaHMT149, and TaHMT152 showed a certain correlation (R = 0.32, p = 0.04) (Fig. 6C). In general, their expression levels showed a positive correlation with the TSW in the samples. These three genes, on the whole, showed under-dominant expression patterns. In addition, their expression in transgressive materials was somehow higher than that of the low-parent materials corresponding to transcriptome sequencing data (Fig. 7). In the HP combination, the gene expression in F 1 were up-regulated by about 20 times at most (TaHMT49 in Combi201), while in some LP combinations, it was down regulated by more than 30 times (TaHMT49 in Combi199), compared with one parent. It is worth noting that this expression trend is not consistent in all materials. For example, TaHMT149 and TaHMT152 in Combi7 has low parental expression. This seems to be a manifestation of the complexity of heterosis. Interestingly, another DEG, TaHMT39, showed a low-parent dominant expression pattern in the transgressive materials (HP) and highparent dominant expression patterns in the low-parent materials (LP) (Fig. 8A).
The high-expression genes in spikes showed multiple expression patterns in all combinations. However, no consistent rule indicated that we could not obtain a more consistent conclusion. TaHMT71 in the M5 combination, TaHMT118 in the M6 combination, and TaHMT162 in the M1 combination had negative correlation expression patterns. In HP, the expression in F 1 was up-regulated by about 6 times, while in LP, the expression was down regulated by over 25 times at most (Fig. 8B). This means that in the HP materials, the genes presented a trend of low expression and a trend of high expression in the LP material. The expression patterns of TaHMT83, TaHMT84, and TaHMT148 were positively correlated in the M4 combination, and their expression trends were consistent with the TSW, showing an over-dominant expression pattern (Fig. 8C). Conversely, TaHMT86, TaHMT89, and TaHMT93 showed under-dominant patterns in some combinations (Fig. 8D).
For the analysis of the expression of the 12 genes in the tillers: as a whole, the expression of the 12 genes had a negative correlation with the number of tillers, but the trend was nevertheless not very significant. No significant correlation was seen between the first two modules and the number of tillers. Therefore, the gene may have no relation to the number of tillers. The expression of the six genes in the third modules (green part) was negatively correlated with the number of tillers (R = -0.34, p = 0.03). From the analysis of the expression of each combination, the TaHMT genes mainly showed a negative correlation (Fig. 9A). That is, , and TaHMT93 showed under-dominance patterns in M1 and M3 combinations. The relative expression of the maternal parent of each combination was relative 1, which is not shown in Fig. 8A but appears in the blue-colored box in Fig. 8B-D, marked as a dotted line. Red represents the relative expression of F 1 , and the yellow adjacent to the right represents the relative expression of its male paternity. + + marks applicability of all HP combinations, and -marks LP combinations. Values are given as means ± SD (n = 3)

Fig. 9 Expression analysis 12
TaHMT genes in tillers. A WGCNA analysis of 12 genes in tillers. B The expression of TaHMT50 and TaHMT122 in specific combinations showed a strong negative correlation with effective tiller number. The relative expression of the female parent of each combination was relative 1, which is shown in the blue-colored and marked as a dotted line. Red represented the relative expression of F 1 , and the yellow adjacent to the right represents the relative expression of its male paternity. + + marks applicability for all HP combinations, and -indicates LP combinations. R represents the Pearson correlation coefficient. Values are means ± SD (n = 3) the expression of F 1 was significantly down-regulated in HP and significantly up-regulated in LP. For example, TaHMT50 was down-regulated by 3-5 times in HP in M6 and M4, and up-regulated by 3 times in LP in M4. In particular, the expression of TaHMT122 in maternal M4 and TaHMT50 in M6 had a strong negative correlation (R > -0.6, p < 0.05) with the tiller number (Fig. 9B).

The abundance of lncRNAs were associated with histone methylation
Studies have shown that the lncRNAs are important regulation factors involved in the methylation of histone [41,42]. So far, a large number of lncRNAs have been found and studied in Arabidopsis, rice, and maize, which are obviously related to the development and reproduction of panicles [43,44]. We obtained four lncRNAs using the JM6 and JM8 lncRNA sequencing databases. Among the four DEGs, TaHMT39 and TaHMT49 showed a strong correlation with these lncRNAs (Fig. 10, Table 4). The expression of these four lncRNAs in the two-line hybrid wheat combinations of JM6(BS366 × GLDS) and JM8(BS366 × TY806) were higher than those in female parents, but lower than that of male parents, showing very similar expression patterns with their possible target genes.

Three TaCCA1 genes were regulated by histone methylation
The growth heterosis of Arabidopsis is related to the change of histone modification state in the promoter region of CCA1 and LHY. The apparent regulation of H3K9ac and H3K4me2 reduces the expression of CCA1 and LHY, while the expression of downstream genes controlling photosynthesis and starch metabolism increases accordingly, which improves the photosynthetic utilization efficiency, starch accumulation and growth heterosis of hybrid F 1 [20]. We predicted the methylation level of TaCCA1 promoter regions and gene body regions at seedling stage (Fig. 11). It was found that the three homologous TaCCA1s    was significantly higher than that in its parents, and the results of ChIP-qPCR showed that histone modification level in the 5 kb promoter regions were also significantly higher than that in its parents (Fig. 12).

Characteristics comparison of HMT genes in wheat and other diploid species
Allopolyploidization is the key event in the formation of common wheat as one of the main reasons for environmental adaptability [45]. Histone methylation modifying genes are widely seen in animals and plants. The increased multi-function of the SET gene evolution may ensure plants to be more adaptive to the complex natural environment [46]. In this study, the number of TaHMT genes identified was about three times than other five diploids and shows that the gene family has expanded. We found that 92.76% of the HKMT genes were triplet genes which is a typical characteristic of hexaploid wheat. Comparing the number of genes in each subfamily in wheat, Arabidopsis, and rice, the main reason is the expansion of gene number in class I E(z)-like and class V Su(var) 3-9-like, while PRMTs is relatively conservative. our study is consistent with previous studies which have shown that the SUV subfamily shows the rapid evolution [36]. We also found that ASH-like also has a high evolution speed. The ratio of this subfamily in wheat to its amount in rice and Arabidopsis was significantly higher than the average. Wheat has more orthologous genes and paralogous genes than other diploid species, which is a typical characteristics of a wheat heteropolyploid. According to the homology analysis, the ortholog group of wheat achieved a higher average size of 2.32, and therefore, every TaHMT gene had an average of 1.32 paralogs. This number is higher than other species, which supports the previous results in wheat [47]. In wheat, 67.11% of SET gene members belong to Suv, Ash, Trx and E(z) families, and most of them have colinearity support, showing that the diversity of SET gene is mainly caused by genome duplication. We also observed that, in addition to the SET protein domain, HKMT has evolved more new protein structures such as pre-SET, SRA, AWS, WIYLD, and SANT, among which SRA and WIYLD are structures plant-specific [48]. In fact, the presence of more auxiliary proteins such as pre-SET and post-SET may make the function of the SET protein more stable. Histone methylation can activate or repress gene expression according to different modified types. The modification types of each subfamily of HMT protein have been widely studied. For example, the activated sites (H3K36 and H3K4) include subfamilies of class II, class III, class IV and class VI, while the exercise inhibited sites (H3K9 and H3K27) contains the subfamilies of class I and class V. Unlike HKMTs, PRMTs are relatively conservative. The role of PRMTs in plants is far less clear than that of HKMTs [49]. Our results show that PRMTs are mainly located in the cytoplasm, unlike the nucleus location of HKMTs, and their physical and chemical properties tend to be consistent. According to the previous studies, PRMTs is mainly divided into three types of arginine methylation: ω-NG monthyllargine (MMA), ω-NG, NG-asymmetric dimethylargine (aDMA), and ω-NG, N'G-symmetric dimethylargine (sDMA) [50]. One type of PRMT can often catalyze two methylation modes. The same arginine measurement plays a major role in general regulation because of the capability of the PRMTs to deposit key activation functions (H4R3me2a, H3R2me2s, H3R17me2a and H3R26me2a) or repressive functions (H3R2me2a, H3R8me2a, H3R8me2a and H3R8me2s) [51]. The phylogenetic tree shows that wheat contains four types of PRMTs, while their catalytic sites and functions need further study.

Histone methylation may be involved in regulating heterosis in two-line hybrid wheat
The utilization of heterosis is the main mean of improving crop yield. Heterosis utilization can be accelerated by using male sterile lines. In wheat, TGMS lines (such as BS366) are the main means of heterosis utilization. Many excellent two-line hybrid wheat varieties with good yield performance were developed by TGMS as maternal materials [52]. The male sterility of TGMS can be inherited and has some epigenetic characteristics. Through the analysis of tissue-specific expression of TaHMTs using Chinses Spring (CS) wheat expression data in this study we found that spike was the concentrated expression tissue, and its expression was concentrated in the vegetative and reproductive period. This is consistent with the previous work, which showed that HMTs were closely related to the development of flower organs and flowering [53]. The transcriptome data of two-line hybrid wheat is to a certain degree consistent with CS expression data. They share a number of genes that are highly expressed in both spike and shoot. The expression analysis of hybrid combinations with different yield performance showed that the correlation between the relative expression of 31 genes and their yield traits is relatively low, which may be due to the lower gene expression levels in the experimental context, the complexity of quantitative traits controlled by multiple genes, and the uncertainty of apparent modification [54,55]. Among the known modes of microRNAs regulating target genes, the two main ways are miRNAs direct target mRNAs cleavage and translational repression [56]. Of the fifteen target sites we obtained, thirteen of them were predicted to act in a cleavage mode indicating a main way for miRNA to regulate HMT genes in wheat. In rice, some miRNAs have been identified to be involved in controlling rice grain size and panicle branching by targeting severe SPL genes, including OsSPL14 and OsSPL16 [57][58][59]. In our study, the expression of TaHMT162 in M1 combination, TaHMT71 in M5 combination showed a significant negative correlation with TSW, while the gene expression of TaHMT50 in M1 and M6 combination also showed a significant negative correlation with the number of effective tillers. In the regulation of miRNAs, there are cleavageregulation modes of tae-miR1122b-3p-TaHMT162 and tae-miR9677a-TaHMT50/71. We speculated that the histone methylation catalyzed by these three HMT genes might negatively regulate the expression of yield-related genes, and the control of miRNA reduces their mRNAs, resulting in low expression in HP but high expression in LP. Whether their regulatory roles are similar to those in rice need further more experimental evidences. It was also found that four DEGs had significant expression differences in JM8 hybrid combinations, which were also reflected in other two-line hybrid wheat combinations. Among the four DEGs, TaHMT39 and TaHMT49 exhibited a strong correlation with the expression of four lncR-NAs in the two-line hybrid wheat combinations of JM6 and JM8 with both cis-regulation mode and trans-regulation mode. This result also further indicated that these DEGs may function positively in the process of heterosis regulated by these lncRNAs. Heterosis is a complex biological phenomenon. The interaction networks provide a theoretical foundation for further study on the methylation of candidate genes and the influence of non-coding RNA on plant growth and development. The cooperative relationship between miRNA-lncRNA-HMT gene will be a promising study for the analysis of heterosis in wheat.
In Arabidopsis hybrids and allopolyploids, increased photosynthetic and metabolic activities are linked to altered expression of circadian clock regulators, including CIRCADIAN CLOCK ASSOCIATED1 (CCA1) [20]. The higher levels of carbon fixation and starch accumulation in the maize hybrids are also associated with altered temporal gene expression and overexpressing ZmCCA1b disrupts circadian rhythms and biomass heterosis [60]. However, this may have a different mechanism from that of dicotyledonous plants. In Arabidopsis, CCA1 modified by H3K9ac and H3K4me2 reduce the expression level, which improves the expression of downstream genes regulating photosynthesis and starch metabolism. Our findings indicated that the expression of three TaCCA1 genes in hybrid with heterosis were higher than that in parents, and the methylation modifications of H3K4me3/ H3K9ac and H4K36me3 activated gene expression in promoter regions were also significantly enrichment. This mechanism is similar to that of monocotyledonous maize, which may be an important way for histone methylation to participate in the heterosis of hybrid wheat. Although the relationship between gene expression and histone modification is consistent with previous knowledge [21,22], the differences of expression patterns of CCA1 among species and how to accurately control the expression of CCA1 by the histone modifications to achieve heterosis are worthy of in-depth research. At the same time, some important genes associated with heterosis for the location of histone methylation modifications are also noteworthy in the future.

Conclusion
In this study, two types of histone methylation genes, histone lysine methylation gene and histone arginine methylation gene, a total of 175 members were comprehensively identified in wheat as well as its relative species, and the functions of the genes in terms of structure, evolution and expression regulation were deeply explored. Through transcriptome sequencing and analysis of expression in different yield performance hybrid combinations, we found some potential DEGs that may be involved in the construction of yield-heterosis. Furthermore, the identification and expression analysis of lncR-NAs with strong correlation to the DEGs were conducted to elucidate their roles for yield-heterosis. Furthermore, the high expression and high histone modification levels of CCA1 in hybrid F 1 indicated that histone methylation played important potential roles in regulating yield traits heterosis. Our results accumulated the original data for the cloning of functional genes in the future, and laid a certain data foundation for in-depth research on the growth and development of wheat and the improvement of yield.

Identification of HMT genes in wheat and its relatives
To identify the putative HMT genes, the wheat genome sequence, gene annotation files, and protein sequences were downloaded from the ENSEMBL database (http:// plants. ensem bl. org/ Triti cum_ aesti vum/ Info/ Index: version: release-46, as well as http:// oct20 17plants. ensem bl. org/ index. html: TGACv1). The genome and protein data for Sorghum bicolor, Hordeum vulgare, and Brachypodium distachyon were downloaded simultaneously. The Hidden Markov Model (HMM version 33.0) was conducted using the profiles of the SET domain (Pfam accession No: PF00856) as queries with the hmmsearch program of all annotated proteins of these species (E < 1e −10 ). BLAT was used to compare the protein sequences of each species with the Pfam sequence of the gene family (the following analysis software used the default parameters unless otherwise specified). The modified DOT1L (Pfam accession No: PF08123), which participates in H3K79 methylation, was the only lysine methyltransferase that has so far been found without the SET-conserved domain [61]. We did not find genes containing the DOT1L conserved domain in wheat.
PRMT5 is the most important type II protein arginine methyltransferase. It catalyzes the formation of symmetrical arginine demethylation [62]. Consequently, the Hidden Markov Model was also conducted using the HMM profiles of PRMT5 domain (Pfam accession No: PF05185). The nine reported Arabidopsis PRMTs and the eight reported rice PRMTs were used to conduct a BLASTP program (Version: blast-2.10) with a wheat protein database. The standard of the BLASTP is as follows: (1) e-value < 1e −10 ; (2) bit-score > 100 and global alignment of identity > 50% overlap, that is, the length of the matching length must exceed 50% of the longer sequence. All of the gene loci were numbered and mapped onto corresponding chromosomes, and the non-chromosome sequences scattered across the genome were merged into unfound-chromosomes (U). The name of each gene begins with Ta, an abbreviation for Triticum aestivum L., and the gene loci were re-numbered based on the order of A/B/D subgroups.

Evolutionary analysis of wheat TaHMT gene family
Multiple sequence alignment and the phylogenetic analysis of the HMT gene family were performed using MUSCLE. InParanoid 4.1 was adopted to analyze the homologous protein (OG, orthologous groups) for every two species, and Multi-Paranoid was used for analyzing multiple species OG [72]. All HMT genes were mapped to their respective locus in the wheat genome in a circular diagram using shinyCircos [73]. The HMT proteins for wheat, rice, and Arabidopsis were introduced into MEGA-X software to construct phylogenetic trees using Maximum likelihood method [74]. In order to obtain more reliable evolutionary analysis, we constructed the phylogenetic trees of HKMT and PRMT of six species at the same time in Figure S5.

Expression analysis of TaHMT gene family
From the expVIP-Wheat Browser (http:// wheat-expre ssion. com/ version: TGACv1), the expression data of root, shoot, leaf, spike, and grain of all TaHMTs in seedling, vegetative, and reproductive growth stages were downloaded. The heatmap for gene expression patterns was generated with the online toolshttps:// www. chipl ot. onlin e/# Heatm ap refer to the instructions on the website. The 28 TaHMT genes with the highest expression of shoots at the seedling stage and spikes at the reproductive growth stage were screened. In screening for differentially expressed genes (DEGs) from the parents, the two-line hybrid wheat JM8 (Jing Mai 8), cultivated by Beijing Hybrid Wheat Engineering Technology Research Center, was used for RNA sequencing. The transcriptome data can be found in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under accession number PRJNA398700 (https:// www. ncbi. nlm. nih. gov/ biopr oject/? term= PRJNA 398700) from JM8 and parental ones (BS366 × TY806) [75]. The DEGs in each comparison from among all TaHMT genes were identified by EdgeR, using a significance threshold of P < 0.05 and a fold change ≥ 2. The first 30 TaHMT genes with the highest expression were also screened. To clarify the relationship between the two groups' data, we used a Venn Diagram (http:// bioin forma tics. psb. ugent. be/ webto ols/ Venn/) to analyze all of the highest-expression genes.

Plant materials and treatments
All plant materials used in this article were provided by Beijing Engineering Research Center for Hybrid Wheat. Two-line hybrid wheat JM6, JM8 and other materials have been approved and widely planted in China. A total of 201 two-line hybrid wheat hybridized combinations were prepared with six female parents (M1, M2, M3, M4, M5, and M6) and 34 male parents. These materials, grown at the experimental farm in Beijing Haidian District (China, N39°54′, E116°18′), were regularly watered and fertilized. Each material was planted in three rows as three biological replicates. The corresponding next generations were numbered Combi1 to Combi201 (also 1 to 201 in figures), and the male parents were identified with a prefixed P to the label, such as in P5 or P13. Tillers from root base to stem about 0.5 cm in length were obtained, which contained the apical meristem. As the wheat grew into the anther separation stage, we sampled the spikes with the awn removed. All plant materials used for the above studies were frozen immediately in liquid nitrogen and stored at -80 °C for RNA extraction.

Collection of yield phenotypes of multiple two-line hybrids
When the wheat was fully matured, all of the samples were collected according to the names of the materials. The number of tillers and spikes with well-grown individuals was investigated. We selected five wheat plants randomly and counted tillers with mature grains. The mature grains were dried and threshed, and 1000 seeds were weighed three times. The yield character data obtained above were recorded, and finally, the dominant differences in F 1 and parents were evaluated by Student's t-test, p < 0.05. The phenotypes and data statistics were listed in Supplementary Figure S4 and Table S6.

Correlation analysis
The expression of the genes was identified as overdominance or under-dominance if the expression level of a gene in F 1 was significantly higher or lower than, respectively, the expression level in both parents. The high-parent dominant genes and low-parent dominant genes were defined such that the expression level in F 1 genotype was significantly different from that in only one parent. Phenotypic data were processed in a similar way to the genotype expression. The materials were divided into high-parent material (HP), middle-parent material (MP), and low-parent material (LP). The yield data of five maternal parents in the hybrid combinations were homogenized. Pearson analysis was conducted to calculate the correlation between yield-character data and gene-expression data after homogenization using SPSS27.0 software. The Pearson correlation coefficient (R) was calculated, and the p-value was considered statistically significant when it was below 0.05. To identify discrete groups of co-expressed genes showing their relationship to multiple traits, we applied TKW/Tillers-HMT weighted gene co-expression network analysis (WGCNA) to integrate expression differences into a higher order. The network was built using 42 wheat samples. Eigengenes (the normalized linear combination of genes with the largest variance in a population) carry the representative gene expression pattern for each module. The association between co-expressed gene modules and phenotypic traits was further assessed using Pearson correlations. Based on our sequencing data of lncRNA of Two-line Hybrid Wheat JM6 and JM8 combinations, we compared the lncRNA with known mRNA and used the results of Cufflinks to screen candidate lncRNA. According to the structural characteristics and noncoding functional characteristics of lncRNA, the assembling results based on StringTie software are screened. LncRNA functions mainly through "cis" or "trans" acting on protein coding target genes. For the prediction of cis target genes, we screened the (upstream and downstream 100 k) protein coding genes near the lncRNA as their target genes. Subsequently, the main functions of lncRNA were predicted by target gene function enrichment analysis. For the prediction of "trans" target genes, we used Pearson correlation coefficient to analyze the correlation between lncRNA and target genes expression among samples, and took the protein coding genes with high correlation for function enrichment analysis to predict the main functions of lncRNA. The Pearson correlation coefficient (R) was calculated using SPSS 27.0 and a multiple linear regression analysis was applied in the bivariate analysis. A t-test was used for comparison among groups.

qRT-PCR and ChIP-qPCR
For the expression analysis of HMT genes, total RNA was extracted from plant tissues using TRIzol reagent (Ambion, USA) according to the manufacturer's instructions. First-strand cDNA synthesis was performed using a TaKaRa PrimeScript ™ RT Reagent Kit with gDNA Eraser (TaKaRa, Dalian, China). qRT-PCR analysis was conducted using an CFX96 Touch ™ Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA) with Takara SYBR ® Premix Ex Taq ™ (Tli RNaseH Plus). The primers for all genes were designed using the Prim-er3plus online tools (http:// www. prime r3plus. com/ cgibin/ dev/ prime r3plus. cgi). Each reaction was performed in triplicate in a reaction volume of 10 µL. Expression levels of genes in samples were normalized using endogenous wheat 18 s gene; the relative expression levels were calculated using the 2 −ΔΔCt method [76]. Chromatin Immunoprecipitation (ChIP) Assay Followed by qPCR (ChIP-qPCR). Chromatin immunoprecipitation assays