Genome wide analysis of MADS-box gene family in Brassica oleracea reveals conservation and variation in flower development

Background MADS-box genes play important roles in vegetative growth and reproductive development and are essential for the correct development of plants (particularly inflorescences, flowers, and fruits). However, this gene family has not been identified nor their functions analyzed in Brassica oleracea. Results In this study, we performed a whole-genome survey of the complete set of MADS-box genes in B. oleracea. In total, 91 MADS-box transcription factors (TFs) were identified and categorized as type I (Mα, Mβ, Mγ) and type II (MIKCC, MIKC*) groups according to the phylogeny and gene structure analysis. Among these genes, 59 were randomly distributed on 9 chromosomes, while the other 23 were assigned to 19 scaffolds and 9 genes from NCBI had no location information. Both RNA-sequencing and quantitative real-time-PCR analysis suggested that MIKC genes had more active and complex expression patterns than M type genes and most type II genes showed high flowering-related expression profiles. Additional quantitative real-time-PCR analysis of pedicel and four flower whorls revealed that the structure of the B.oleracea MIKC genes was conserved, but their homologues showed variable expression patterns compared to those in Arabidopsis thaliana. Conclusion This paper gives a detailed overview of the BolMADS genes and their expression patterns. The results obtained in this study provide useful information for understanding the molecular regulation of flower development and further functional characterization of MADS-box genes in B. oleracea. Electronic supplementary material The online version of this article (10.1186/s12870-019-1717-y) contains supplementary material, which is available to authorized users.


Background
The MADS-box family consists of genes encoding a class of transcription factors characterized by the presence of 58-60 conserved amino acids known as the MADS domain [1]. The MADS domain was named after the first four members of this family encoded by genes, MINICHROMOSOME MAINTENANCE 1 (MCM1) from yeast [2], AGAMOUS (AG) from Arabidopsis thaliana [3], DEFICIENS (DEF) from Antirrhinum majus [4], and SERUM RESPONSE FACTOR (SRF) from Homo sapiens [5]. MADS-box genes are widely distributed in eukaryotes and play important roles in an organism's growth and development, particularly in floral organ differentiation, flowering time regulation, and fruit development and ripening in angiosperms [6,7].
Based on phylogenetic analysis, the MADS-box genes are divided into two categories: type I and type II [8]. All type II members in yeast, animal and plant are MEF2-like genes, while type I MADS-box genes in plant are SRF-like [8]. In addition to their common MADS (M) domain, MIKC genes also include intervening (I), keratin-like (K), and C-terminal (C) domains [9,10]. M is the most conserved domain and located in the N-terminal region, which binds specific DNA through the CArG-box (5′-CC(A/T) 6GG-3′) in the promotor region of the target gene. I is a less conserved region consisting of approximately 30 amino acids and determines the specific binding of transcription factors to DNA and dimer formation. The K domain is the second conserved region forming a coiled-coil structure, which facilitates interactions between proteins. The C-terminal domain is the least conserved region, consists mainly of hydrophobic amino acids, and plays an important role in the formation and transcriptional activation of protein complexes [6,11,12]. Based on sequence and structure differences of the I domain, the MIKC genes are further divided into MIKC C -and MIKC * -types. MIKC C -type genes are the most widely studied MADS-box genes because they are essential in plant growth and development [13,14]. According to the phylogenetic analysis of the M domain, type I genes (also known as M type genes) in plant are divided into four groups, Mа, Mß, Mγ, and Mδ. In Arabidopsis, Mδ genes are also assigned as MIKC * -type genes based on their relatively close relationships with MIKC C -type genes [15].
Brassica oleracea contains multiple vegetable crops including broccoli, cauliflower, brussels sprouts, various types of cabbage, kohlrabi, and kale. Cauliflower is an economically important vegetable planted worldwide [35]. Compared with other crops of B. oleracea, cauliflower has a unique characteristic -a tight white flower curd, consisting of a large amount of shortened branches of indeterminate inflorescences [36]. Previous studies showed that the formation of cauliflower curd could be related to two MADS-box genes BoCAL and BoAP1, and in cauliflower the bocal mutant allele is present [37,38]. However, other studies found only a weak connection between cauliflower phenotype and boap1 and bocal mutations [39,40]. Moreover, Lan and Paterson reported that curd-related traits are affected by 86 quantitative trait loci (QTLs), indicating that the cauliflower arrest is multi-gene controlled [41]. Genome-wide studies of MADS-box genes can be conducted, as the B. oleracea genome has been sequenced [42]. In this study, a total of 91 MADS-box genes were identified in the B. oleracea genome. The phylogenetic relationships, gene structure, chromosomal locations, and conserved motifs of the encoded proteins were analyzed. The tissue-specific expression of all BolMADS genes in cauliflower were further studied. This work provides useful information regarding the molecular mechanisms underlying cauliflower curd formation and flower development.

Identification of MADS-box genes in B. oleracea
A total of 95 candidate genes were identified. Based on sequence analysis, four candidate MADS-box genes composed of approximately 200 bases were confirmed to contain incomplete MADS-box domains and then were excluded from the analysis. Therefore, a total of 91 MADS-box genes with complete and functional structures were identified in the B. oleracea genome. Among them, 82 genes were found in the B. oleracea genome database (named BolMADS1-BolMADS82), and 9 genes were acquired from the NCBI database (named with NCBI access numbers).
The MADS-box genes in B. oleracea genome had coding sequence lengths of 249-1221 base pairs. The encoded proteins length ranged from 82 to 406 amino acids with predicted molecular masses of 9.52-46.31 KDa and protein isoelectric points of 4.08-10.53 (Table 1).

Phylogenetic analysis of MADS-box gene family
A phylogenetic tree was constructed based on the sequence of full-length MADS-box proteins from B. oleracea and A. thaliana using the neighbor joining method (Additional file 1: Figure S1). The 91 BolMADS genes were classified into two categories, type I (49) and type II (42). Independent phylogenetic trees of both types were constructed using MADS genes from both species (Fig. 1). Type I BolMADS genes were further divided into three subgroups: Mα, Mβ, and Mγ. The Mα subfamily was the largest, possessing 25 genes, while the Mβ and Mγ subgroups showed nearly the same number of genes with 11 and 13, respectively.
In type II BolMADS genes, we found 36 MIKC C -type and 6 MIKC*-type genes. A total of 12 MIKC C evolutionary branches were detected according to the known  (8) of BolMADS MIKC C -type genes. We found two subgroups of AGL6-like and AGL17-like genes which consisted of only one member each. In contrast with A. thaliana, B. oleracea AP1-, AGL12-, TT16/PI-, and AP3-like gene clusters are expanded by gene duplication, whereas TM3-, and AGL17-like genes do not have paralogs.
Using the neighbor joining method, another phylogenetic tree was constructed with the full-length of MADS-box protein sequences from B. oleracea and B. rapa (Additional file 2: Figure S2). In general, most MADS genes in B. rapa showed repetition and expansion compared to those from B. oleracea. Moreover, type II MADS-box genes showed higher expansion degrees than type I genes. In type II MADS genes, the TM3 subgroup showed the highest gene expansion degree, with a total of 16 TM3 genes in B. rapa but only three in B. oleracea. The AP3 and AGL12 subgroups contained nearly the same number of genes in these two Brassica species.

Gene structure and conserved motif distribution analysis
The gene structure and intron/exon arrangements of the BolMADS genes were determined by comparing their full-length cDNA and genomic DNA sequences. All type II BolMADS genes contained at least three introns except for BolMADS16 and BolMADS23. BolMADS16 showed the most conserved domain of M with no introns and BolMADS23 showed two introns. In type I genes, Bol-MADS34, 42, 67, 78, 80, and 82 were found to have one intron, while the others were intronless ( Table 1).
The structures of proteins produced by BolMADS genes were analyzed using MEME online software. Ten conserved motifs, named as motifs 1-10, were identified (Fig. 2). Motif 1 and 5 were corresponding to the typical MADS domain. The main motif 1 was found in all BolMADS proteins. Motif 2 specified the K domain was found in most MIKC C type genes except BolMADS9, 12, 16 and 23, which contained relatively short amino acid sequences. For the six MIKC * type proteins, BolMADS29 and 33 contained motif 2, while the others have not. The K domains typical of type II proteins were missing in all type I BolMADS proteins. N-terminal motif 6 was found in all type II and in the majority of type I proteins. Motifs 4 and 8 were specific to some of the Mα proteins, motif 10 -to some of the Mα and Mβ proteins, while motifs 3 and 9 -to some of the Mγ proteins.

Chromosome distributions of MADS-box genes
The physical locations of BolMADS genes were mapped to the 9 chromosomes of B. oleracea (Fig. 3). Of the 91 genes, 59 were randomly distributed on 9 chromosomes, 23 were assigned to 19 scaffolds and 9 from the NCBI database had no location information. For MIKC-type genes, 26 were mapped on 8 chromosomes (except Chr5). Chromosomes 3 and 9 contained the maximum number of six genes, while only one gene was found on chromosome 1. The other 7 MIKC-type genes were randomly distributed on 7 scaffolds. Type I genes showed a completely different distribution. First, of the 49 type I genes, 31 scattered nonrandomly on all 9 B. oleracea chromosomes. Chromosomes 2 and 4 had the largest number of genes and the other chromosomes contained 2-5 genes. Second, the remaining 16 genes were distributed on 12 scaffolds. Sca194_P1 had the largest number of genes, showing three genes. Sca153 and Sca192 each contained two genes. There was one gene on each of the other scaffolds.
Duplicated genes including segmental and tandem duplications were detected in the MADS-box gene family of B. oleracea. There were higher frequencies of segmental duplications and a total of 10 genes contained corresponding homologous segments. In contrast, only three groups of genes were contained tandem duplications, which all related to type I genes. The first group containing BolMADS37, 38, and 34 was on chromosome 2 and the second group containing BolMADS49 and 50

Genome-wide expression analysis of MADS-box genes
The RNA-seq data of 82 MADS-box genes were downloaded from B. oleracea Genomics Database (http:// www.ocri-genomics.org/bolbase/) and the other 9 genes from the NCBI database had no information of RNA-seq. The genes were found to be differently expressed in six tissues including the callus, root, stem, leaf, flower, and silique. Two heat maps were constructed for type I and type II genes (Fig. 4). The overall expression of type II genes was more active and diverse than that of type I genes. A total of 30 genes showed no expression or their transcription levels were too low to be detected on the heat map. Of the 30 genes with no or low expression, 22 were type I and 8 were type II. In all six tissues, only the FLC-like gene BolMADS22 was expressed.
Majority of MIKC-type genes (22 of 33), belonging to the AP1, SEP, AG, AGL12, AGL15/18, and FLC subfamilies, were expressed in the flower and /or silique. BolMADS19 and 21 (AP3-like) were consistently and specifically expressed in flowers. However, most genes within one subfamily often showed different expression patterns. Among the six AP1 genes, three (BolMADS1, 2, 3) showed high expression levels in the flower and silique and three genes (BolMADS10, 16,20) showed no expression in the six tissues. In the AGL12-like subfamily containing three genes, BolMADS26 was highly expressed in the flower, while the other two genes showed no or very low transcription in the six tissues. Interestingly, six MIKC * genes also displayed very different expression patterns. Four genes (BolMADS29, 30,31,33) showed no or low expression levels. The other two genes (BolMADS28, 32) were highly expressed in flowers and BolMADS32 was detected in the leaf and callus. Compared to MIKC genes, type I genes were inert and their expression patterns were simple. A transcription of only 12 genes was detected, while the other Fig. 2 Distribution of conserved motifs in B. oleracea MADS-box proteins identified using MEME genes were silent. These 12 genes were expressed in the silique and BolMADS75 showed high expression in the flower, leaf, stem, and callus.

Expression patterns of MADS-box genes in different tissues of cauliflower
MADS-box genes were reported to have diverse functions in plant growth and development, particularly in floral organs specification. Therefore, we examined the expression patterns of 87 B. oleracea MADS-box genes in seven different tissues of cauliflower including the root, stem, leaf, curd, bud, flower, and silique by qRT-PCR (Fig. 5). A total of 48 genes were expressed in at least one tissue, while the other 39 genes showed no or very low expression. Among the 48 expressed genes, 35 belonged to the MIKC type and the other 13 were type I.
All MIKC type genes were highly expressed in reproductive organs including the curd, bud, flower, and silique except for BolMADS15 (SVP-like), 16 (AP1-like), 20 (AP1-like), and 27 (AGL12-like), which showed higher transcription levels in vegetative tissues (root, stem, and leaf ). We found that not only AP1-like genes (BolMADS1, 3, 10, XP_013590290.1, XP_013589842.1), but also AP3-like BolMADS21 gene, SEP1-like Bol-MADS5 gene, SVP-like XP_013600280.1 gene and PI-like XP_013626307.1 gene showed relatively high expression levels in the curd, which is a specific reproductive organ in cauliflower. This result indicates that these genes play important roles in cauliflower curd formation and also confirmed the results of a previous study showing that AP1-like genes control flower primordium development. In addition, XP_013590290.1 and XP_013589842.1 were also highly expressed in the bud and flower, and XP_013590290.1 showed the highest expression level in For the 45 tested type I genes, 13 showed different expression patterns. BolMADS71 showed the highest expression level in the flower. BolMADS40 and 70 were specifically expressed in the root, indicating roles in root development. The other three genes (BolMADS67, 72, and 73) showed high transcription levels in the silique and may be involved in silique development.

Key MADS-box genes involved in floral organ development
A total of 21 MIKC type genes and one type I gene showing relatively high expression levels in the flower were selected to further investigate their roles in floral organ development. The expression patterns of these genes were detected in B. oleracea pedicel, sepal, petal, stamen, and pistil by qRT-PCR (Fig. 6). Three AP1/FUL-like (BolMADS1, 2, 3), two AP1/AP1-like (XP_013590290.1 and XP_013589842.1) and one AP1/CAL-like (BolMADS 10) orthologous genes showed different expression modes. BolMADS1, 2, and 3 showed relatively higher expression levels in the pistil than in other flower organs, while BolMADS10 showed low expression in all five flower organs. XP_013590290.1 and XP_013589842.1 highly expressed in sepal and petal, and XP_013590290.1 showed the highest expression level in sepal. AP3-like BolMADS19 and 21 and PI-like XP_013626307.1 and XP_013607408.1 were characterized as B class genes. These two groups of genes were specifically expressed in the petal and stamen and XP_013626307.1 showed the highest expression level in these two floral organs. BolMADS13 and 14 of the AG-like subfamily (C/D class) showed a similar expression pattern as BolMADS17 and 18 from the GGM13-like subfamily, which were characterized as B sister genes (Bs class). These two groups of genes were preferentially expressed in the pistil. In addition to expression in the pistil, another AG-like XP_013597287.1 gene also demonstrated trace expression in the stamen. BolMADS5 and 6 belonging to the SEP-like subgroup (E class) were detected in all four floral organs. The above results are consistent with the classical ABCDE flower model. These two genes were expressed in all the five flower organs except the pistil. The overall expression level of XP_013632159.1 was much higher than that of XP_013630326.1. BolMADS28 of the MIKC * type was specifically expressed in the stamens and its expression level was the second highest among all B. oleracea MADS-box genes, indicating its involvement in stamen development. One type I gene, BolMADS71, was preferentially expressed in the petal at relatively high levels. This indicates that BolMADS71 also participates in the process of flower development.

Discussion
Cauliflower is one of the most important vegetables grown worldwide. Curd formation, floral organ development, and flowering time are important agronomic traits in cauliflower breeding and production for directly determining its adaptability and commerciality [36]. The MADS-box genes, particularly the MIKC family members, play important roles in plant flower development. Previous studies reported the characterization of MADS-box genes in several Cruciferous crops including A. thaliana and B. rapa [15,43]. With the completion of the B. oleracea genome sequence, the BolMADS genes can be systematically identified and analyzed [42]. In this study, we identified a total of 91 BolMADS genes and comprehensively analyzed these genes to determine their phylogenetic relationships, chromosomal locations, gene structures, and expression patterns.

Causes of MADS-box genes loss in B. oleracea
Approximately 20-40 Mya, one whole genome duplication (WGD) event occurred in the Brassicaceae genome, after which the B. oleracea genome underwent another two whole genome triplication (WGT) events against Arabidopsis [44][45][46]. This caused the B. oleracea genome to become nearly 5-fold larger than the Arabidopsis genome. However, the number of MADS-box genes in B. oleracea is not directly proportional to the genome size. Only 91 BolMADS genes were identified, which is lower than the numbers in A. thaliana and B. rapa [15,47]. This may be for two reasons. First, after genomic duplication, plants often show large areas of gene loss and chromosome rearrangements to maintain a metabolic balance [45]. Thus, most BolMADS genes may have been lost through later evolutionary processes. Second, approximately 4 million years ago, B. oleracea was differentiated as a branch [46]. Since then, this species has been subjected to strong artificial and natural selection, resulting in the loss of different BolMADS genes.

Organization of BolMADS genes
Gene organization was reported to play important roles in the evolution of multigene family [48][49][50]. In this study, the MIKC type genes contained a much larger number of introns compared to M-type genes. Among the MIKC type genes, 30 of 33 contained more than 3 introns, whereas all M-type genes had no or only one intron. The same intron distributions have been observed in other species, such as Arabidopsis, Chinese cabbage, rice, and cotton [15,45,51,52]. This can be interpreted as a difference between type I and II MADS genes in their tendencies for deletion or acquisition of introns and indicate evolutionary conservation between different plant species [53]. BolMADS genes within the same group had a similar motif construction with common evolutionary ancestors. The consistency in intron number, motif structure, and phylogeny demonstrate that these genes were correctly classified.

Expression patterns of MIKC type genes in cauliflower
Most MIKC type genes were not expressed or lowly expressed in vegetative organs, including the roots, stems, and leaves, and were abundantly expressed in the reproductive organs, suggesting that MIKC type genes play important roles in the growth and development of reproductive tissues.
The classical model of flower development in plant is the ABCDE model. In A. thaliana, four A class genes have been cloned: AP1, CAL, FUL, and AGL79. AP1 acts as a floral organ identity gene to promote the development of petals and sepals and specifies floral meristem identity [16,54]. However, CAL, which is an AP1 paralogue, has only partial functions of AP1 and promotes floral meristem formation during flower development [19,20]. In cauliflower, two AP1 (XP_013590290.1 and XP_013589842.1) and one CAL homologous genes (Bol-MADS10) were detected and they were highly expressed in the curd tissue. Curd is the edible organ of cauliflower, which is composed of inflorescence meristems. This result indicates that these three genes play important roles in curd formation. FUL has a wide range of functions: it can play a role in the flower organ specification and function in carpel development [55,56]. A total of three FUL homologous genes (BolMADS1, 2, and 3) were detected and showed similar expression patterns with relatively higher expression in the petals and ovary than in the other floral organs.
The main function of the B class genes (AP3 and PI) is to control the development of stamens and petals, particularly in the decision to form male reproductive organs [21]. Two AP3-like (BolMADS19 and 21) and two PI-like (XP_013626307.1 and XP_013607408.1) B class genes were detected in cauliflower and were highly expressed in the petals and stamens, indicating similar functions.
A total of 4 genes in Arabidopsis belong to the C/D class, including AG, SHP1 (also known as AGL1), SHP2 (also known as AGL5), and STK (also known as AGL11). These genes are mainly involved in the development of stamens, carpels, ovules, and fruit [4,22,23]. The Bol-MADS4, 13, 14 and XP_013597287.1 genes of the C/D class were preferentially expressed in the pistil, indicating their participation in pistil development.
E class genes (SEP) have obvious partially redundant functions during flower development. Moreover, these genes do not function alone in flower development, but function through interactions with ABC genes. In different species, SEP-like genes have different degrees of sub-functionalization and neo-functionalization [25,57].
In cauliflower, three SEP-like genes were identified, with BolMADS5 showing the highest overall expression level in the flower. BolMADS5 was expressed in all five floral organs, but its expression level was highest in the petals, followed by in the ovary and stamen, indicating its roles in the development of these three floral organs.
Interestingly, BolMADS28 (MIKC * type) was specifically expressed in the stamens and its expression level was highest among all MIKC type genes. Its homologue in Arabidopsis is AGL104, which is required for pollen maturation and pollen tube growth [58]. BolMADS28 may be a new gene that plays an important role in stamen development in cauliflower.

Expression profiles of M type genes in cauliflower
Compared to MIKC MADS-box genes, the information about functions of M type genes are very limited. Several studies in Arabidopsis indicated that some M type genes participate in plant growth and reproduction, particularly in defining the female gametophyte and in the development of the embryo and endosperm [58][59][60]. In this study, 13 of 45 (28.89%) M type genes were detected in different tissues. Most detected M type genes were mainly expressed in reproductive tissues, while other several genes showed specific expression in the roots, such as BolMADS40 and 70. BolMADS71 showed the highest expression level in the flower tissue, mainly in the petals. This suggests the participation of Bol-MADS71 in petal development. BolMADS67, 72, and 73 were the top most abundantly expressed genes in the silique, indicating their roles in silique development.

Conclusions
In conclusion, 91 genes were identified in the Brassica oleracea genome. These genes were divided into 42 type II and 49 type I genes, and the type II genes were further divided into 13 subfamilies. The exon/intron structures, conserved motif distributions, and chromosomal locations of MADS-box family members in B. oleracea were also determined. Expression analysis of the BolMADS genes indicated that most MIKC C genes participated in flower development, which is consistent with the ABCDE model. Several non-MIKC C genes were also found to be highly expressed in the stamens and petals, indicating their important roles in the development of these floral organs.

Identification of MADS-box gene family in B. oleracea
The sequences of Arabidopsis MADS-box genes were obtained from the TAIR database (http://www.arabidop sis.org) and used as queries for a BLASTP algorithmbased against the B. oleracea genome database (http:// www.ocri-genomics.org/bolbase/). For all candidate genes, we also examined whether they contain the MADS domain (PF00319) in the SMART (http://smart. embl-heidelberg.de) and Pfam (http://pfam.sanger.ac.uk) databases. Sequences without a MADS domain were deleted.

Conserved motifs and gene structure analysis of MADSbox proteins
Conserved motifs in MADS proteins of B. oleracea were identified using the program Multiple Em for Motif Elicitation (http://meme-suite.org/tools/meme). Default parameters were selected except that the maximum number of motifs was set to 10. The BolMADS gene structure was predicted using the program of GSDS2.0 (Gene Structure Display Server, http://gsds.cbi.pku.edu.cn/) for both genome and coding domain sequences.
Phylogenetic tree construction and protein conserved domain sequence alignment MADS proteins of B. oleracea with A. thaliana and B. rapa [47] were aligned respectively using ClustalW with default settings. The two phylogenetic trees were constructed with MEGA5.04 software using the neighbor-joining method. The bootstrap test was executed by 1000 replications. The resulting phylogenetic tree was prepared in FigTree (v1.3.1) software.

Chromosomal location and gene duplication
The physical positions of all MADS-box genes along each chromosome were identified from the B. oleracea database. Gene duplication information was obtained by aligning all B. oleracea MADS-box genes. The criteria of both the query coverage rate and identity of the aligned region were more than 80%.

Plant materials and tissue-specific expression analysis
In this study, RNA-seq data from the B. oleracea Genomics Database (http://www.ocri-genomics.org/bolbase/) were used to analyze the expression patterns of MADS-box genes in different tissues of cultivated B. oleracea plants.
Seven types of tissues from cauliflower (a DH line from cauliflower cultivar "QingNong 65") were used for quantitative real-time (qRT)-PCR analysis of all MADS-box genes. These tissues included the root, stem, leaf, curd, bud, flower, and silique. MADS-box genes with relatively higher transcription levels in the flower tissue were selected for qRT-PCR analysis in pedicel and four whorls of flower organs, including the sepal, petal, stamen, and pistil.
Approximately 100 mg of frozen tissue was collected for total RNA isolation using an RNA kit (RNAprep Pure Plant Kit, Tiangen, China) according to the manufacturer's instructions. One microgram of total RNA was reverse-transcribed to the first-strand cDNA using the