An Actinidia chinensis (kiwifruit) eFP browser and network analysis of transcription factors

Transcriptomic studies combined with a well annotated genome have laid the foundations for new understanding of molecular processes. Tools which visualise gene expression patterns have further added to these resources. The manual annotation of the Actinidia chinensis (kiwifruit) genome has resulted in a high quality set of 33,044 genes. Here we investigate gene expression patterns in diverse tissues, visualised in an Electronic Fluorescent Pictograph (eFP) browser, to study the relationship of transcription factor (TF) expression using network analysis.


Abstract Background
Transcriptomic studies combined with a well annotated genome have laid the foundations for new understanding of molecular processes. Tools which visualise gene expression patterns have further added to these resources. The manual annotation of the Actinidia chinensis (kiwifruit) genome has resulted in a high quality set of 33,044 genes. Here we investigate gene expression patterns in diverse tissues, visualised in an Electronic Fluorescent Pictograph (eFP) browser, to study the relationship of transcription factor (TF) expression using network analysis.

Results
Sixty-one samples covering diverse tissues at different developmental time points were selected for RNAseq analysis and an eFP browser was generated to visualise this dataset. 2,839 TFs representing 57 different classes were identi ed and named. Network analysis of the TF expression patterns separated TFs into 14 different modules. Two modules consisting of 237 TFs were correlated with oral bud and ower development, a further two modules containing 160 TFs were associated with fruit development and maturation. A single module of 480 TFs was associated with ethylene-induced fruit ripening. Three "hub" genes correlated with ower and fruit development consisted of a HAF-like gene central to gynoecium development, an ERF and a DOF gene. Maturing and ripening hub genes included a KNOX gene that was associated with seed maturation, and a GRAS-like TF.

Conclusions
This study provides an insight into the complexity of the transcriptional control of ower and fruit development, as well as providing a new resource to the plant community. The eFP browser is provided in an accessible format that allows researchers to download and work internally.

Background
Global transcriptomic approaches are a common tool used to obtain a better understanding of gene function and regulation. The composition of the transcriptome is the result of a dynamic balance between chromatin state, the activation of gene expression by transcription factors (TFs) and the speed of transcript degradation. The combination of good genomic information and robust gene models paves the way for systematic and consistent gene and gene family naming. This combined with other genomics tools, such as Electronic Fluorescent Pictograph (eFP) browsers [1] to help visualise where a gene is expressed, allows faster identi cation of gene function in different species. To date, eFP browsers have been successfully developed in plants such as Arabidopsis [1], tomato [2], strawberry [3], and pineapple [4].
TFs are one of the largest groups of genes in a genome; in Arabidopsis there are over 1500 TFs described, belonging to a number of different classes representing 5% of all genes [5]. In other species TFs represent 3-5% of coding genes, with function often conserved across species [6]. TFs have been grouped into 57 different classes [5] with some classes having multiple types of DNA binding domains. Each class of TF is represented by a gene family. These gene families vary in size from species to species depending on events such as individual gene and genome duplications, leading to expansions of certain or most families [6]. In higher plants the MYB, bHLH and Zinc nger classes of TF contain many hundreds of members [6]. There are numerous examples demonstrating the strong evolutionary maintenance of TF primary protein structure across species, with the homologous genes having a similar gene function. This allows researchers to predict function by homology [7].
The MADS-box containing TFs form arguably one of the best understood classes of TF. Members of the MADS-box gene family, including the well-known oral organ structure ABCE TFs, determine many aspects of plant development [8,9]. Even though the fruiting bodies of Angiosperms are homoplasious, with eshy fruit evolving numerous times within many plant families the function of these genes appear conserved [10]. Angiosperm ower structure and fruiting bodies are remarkably conserved, with whorls of sepals, petals, stamens and carpels [8]. The MADS protein sequence is also conserved with many examples within plants demonstrating similar control mechanisms across many species [7,11].
Kiwifruit are part of the Actinidiaceae which is a basal family within Ericales [12], and contains the genus Actinidia comprising of a number of economically important fruit species such as Actinidia chinensis var. deliciosa (green kiwifruit), A. chinensis var. chinensis (gold and red kiwifruit) and A. arguta (hardy kiwifruit or kiwiberries). The green 'Hayward' kiwifruit is hexaploid, while a commercially released yellow eshed variety A. chinensis var. chinensis, 'Hort16A', and the red eshed A. chinensis var. chinensis 'Hongyang' are large fruiting diploid genotypes making them ideal for understanding molecular processes in Actinidiaceae. More recently a new Pseudomonas syringae pv. actinidiae (Psa) tolerant tetraploid gold variety, 'Zesy002', has replaced 'Hort16A' in the markets. The two diploid cultivars have been used to understand the molecular control of many aspects of development including owering, fruit ripening, colour and avour development [13][14][15][16]. Genomics tools such as CRISPR gene editing have been successfully used to edit the oral repressors in 'Hort16A' to create a small fruiting plant that can be used to rapidly test gene function in fruit, further building on their utility [17].
The rst draft kiwifruit genome was of A. chinensis 'Hongyang', published in 2013 [18], paving the way for genomics in Actinidiaceae. More recently a second A. chinensis genome of a more inbred related genotype, Red5, further improved the construction and importantly manual annotation of gene models [19]. The manual annotation of the kiwifruit genome improved the quality of the published computer predicted gene models, and provided a quality resource for future gene mining. Here we build on these data by identifying TF genes, analysing their expression over a number of tissues and providing an eFP Browser tool to analyse gene expression.

Results
Mining of transcription factor gene families Using the kiwifruit manually annotated gene models [19], protein translations containing InterPro DNA binding domains (http://www.ebi.ac.uk/interpro) were identi ed. These were manually checked, resulting in 2,839 gene models with at least one TF domain in 61 TF classes in 32 global classes ( Fig. 1, Table 1, Additional data 1). The most abundant global class of TFs in kiwifruit was the Zinc nger class, represented by 571 genes in 11 different gene classes, followed by 428 genes with MYB domains within four different classes of genes, and 333 bHLH genes within two different classes of genes (Table 1, Additional data 1). In total the 2,839 TFs represented 8.6% of the annotated genes in the kiwifruit genome. Using the DNA binding domains, each TF class was aligned in a phylogenetic tree and named using the following criteria. Firstly, if the gene had been previously published in the literature, this name was given. For genes not previously published, a sequential naming down an initial phylogenetic tree was used. This naming method allows genes within subclades to have numbers which are close to each other. An exception was made for the ARF genes where the whole family has been well characterised in a number of species [20][21][22], so a naming convention related to the closest Arabidopsis and tomato homologues [22] was taken. An example cluster of the MADS TF Type 1 and MICK cluster is shown in Fig. 2 and full phylogenies can be found in additional data 2. Within all the gene families, there were typically two closely related genes observed at each branch consistent with the reported genome duplication [19].
In kiwifruit, four classes of TFs have been previously reported, two (the R2R3 class MYBs [23] and WRKYs [24]) were based on a previous version of the kiwifruit genome, the third (the MICK type MADS-box genes) was published using EST sequence data [25] and the fourth (AP2/ERF class of gene) was based on the manually annotated genome [26]. The 96 published WRKY TFs were previously named based on sequential chromosomal locations. Of these, ve did not have an Acc annotated gene model and two models (WRKY95 and WRKY96) appear to be splice variants. The ve were annotated using the Web Apollo software and Acc numbers assigned. This study identi ed an additional 21 WRKY genes and these new genes were sequentially numbered, bringing the total to 116 WRKY genes.
A comprehensive analysis of EST sequences and full length sequence of nine MADS-box genes was reported by Varkonyi-Gasic et al. [25]. Since this study, four SVP like genes [27] and eight SOC1 like genes [28] have been reported. Further mining identi ed 58 further predicted gene models containing a MADS-box DNA binding domain. The MADS genes separated into two major clades; the Type 1 and MICK type. Previously the MICK type MADS-box genes have been shown to be key regulators of plant development, especially in oral and fruit development. Phylogenetic alignment identi ed sequences with high similarity to the well-characterised MICK-MADS genes and identi ed possible homeologous pairs of: AGAMOUS (AG) like genes, Acc25178.1 (MADS28) and Acc20728.1 (MADS29); PISTILLATA (PI) like genes, Acc24088.1 (MADS11) and Acc05042.1 (MADS12); and APETALA1 (AP1) like genes, Acc04040.1 (MADS40) and Acc02284.1 (MADS41) (Fig. 2).

Expression analysis
To establish where and when each of the TFs were expressed, a transcriptomic approach was taken. Global gene expression of different tissues and different plant developmental stages of two cultivars of A. chinensis var. chinensis, the gold fruited 'Hort16A' and 'Zesy002' were measured. Sixty-one sets of RNA-seq from root, stem, shoot, leaves, owers, and early fruit development were combined with RNA-seq reads from fruit development [15] and postharvest [15] were used and a bud development series (

Network analysis
Weighted gene co-expression network analysis (WGCNA) [30] assigned the 2,773 expressed TFs to 14 different module colour groups (Fig. 5A). When these were compared to the different tissue types, there were some clear correlations between some modules and different tissue (Fig. 5B). Given that some of the tissue may have been harvested at different times of the day, two circadian genes (a morning MYB related gene, LHY[31] -MYBR80 Acc24169.1, and an afternoon gene, GIGANTEA [32] Acc12229.1) were also correlated with the group. Weak correlations between these genes and the blue and green modules were observed (Fig. 5B). One of the strongest correlations was root tissues and the yellow module containing 323 TFs ( Table 2, Additional data 4) showing 96% correlation (Fig. 5B,C). As expected the root predominant MADS TF (Acc00495.1 MADS19) was found within this yellow module. Some tissue types showed signi cant association with more than one network group. Floral buds and open ower showed correlation between the red and purple modules with 159 and 78 genes respectively. A magenta module containing 100 genes had linkages with early fruit development while later fruit development and maturation were more associated with the tan module containing 60 genes. In the turquoise module, 457 TFs were associated with a postharvest ethylene treatment of ripe fruit. A network graph with key known TFs generated for all the ower and fruiting genes (Fig. 6) demonstrated a strong interrelatedness of this selection of genes. Calculated hub genes for the red, purple, magenta, tan and turquoise clusters were Acc17850.1 -ERF65, Acc28494.1 -bHLH281, Acc18135.1 -DOF43, Acc15461.1 -KNOX3, Acc20237.1 -GRAS13 respectively. The most similar Arabidopsis gene to bHLH281 is AT1G25330 -HALF FILLED (HAF) that speci es reproductive tract development in Arabidopsis [33]. The tan hub gene KNOX3 is similar to AtKNAT7 which has been proposed to work with AtPAP1 (the homologue of which, Acc00493.1 -MYB10/75, is also found in the tan cluster) to develop the seed coat [34].

Discussion
By combining gene mining and expression analysis of the TF families from kiwifruit we have constructed a gene network for different tissues at different developmental stages during the plant life cycle. Through a close examination of the ower and fruit networks which were associated with the red, purple, magenta, tan and turquoise modules (Fig. 6A), a number of MADS-box TFs with close homology to those characterised in other species were found. In the red module there were 10 MICK MADS-box genes including the previously published AGAMOUS (AG) gene MADS29 (Acc20728.1) and its homeologue MADS28 (Acc25178.1), a PISTILLATA-like gene, MADS11 (Acc24088.1), and two SEP-like genes, while the purple module contained ve MICK MADS genes including a second PISTILLATA-like gene, MADS12 (Acc05042.1). The ethylene treated fruit associated with the turquoise module contained three MICK MADS-like genes including the previously published RIN/SEP4-like gene (MADS52 -Acc26640.1) [15].
As the AG genes in other organisms have been shown to be the key carpel identity genes, the connectivity of the AG homologues MADS28 and MADS29 was examined further. Fifty-nine genes with a high (> 0.50 weight) association with the AG genes were identi ed and mapped (Fig. 6B). A network map of this subset shows a strong level of interdependency of these genes. Within this sub-network, genes that have been shown to be a direct target of AG in Arabidopsis were identi ed (Fig. 6B Orange). These include the aforementioned SPL, CRC and a HEC2 like bHLH (bHLH65) genes.
Within the fruit ripening ethylene associated turquoise module there were a considerable number (29) of NAC TFs identi ed, including the previously described NOR like genes (NAC1, NAC2, NAC3) [36], as well as seven of the eight EIN3-like genes. This module also included 45 ERF genes [26], as well as DOF4 [37,38]. A previous studies of fruit ripening analysis [37] described 10 TFs associated with ripening (Additional data 1), most of which were located in other coloured modules suggesting that the wider study presented here gives a better resolution of tissue speci c genes.

Conclusion
In summary, we demonstrate that the eFP browser that we constructed and customised to display gene expression data, in combination with genome wide identi cation of the TFs and weighted gene coexpression network analysis provides a powerful platform for in-depth investigation of control and regulation of important processes in the plant life cycle and these tools can be easily customised to other fruiting species.

Plant material
All new tissue presented in this study (Additional data 3) were from the diploid Actinidia chinensis var. chinensis cv. 'Hort16A'. These data were combined with previously published data from an A. chinensis var. chinensis cv. 'Hort16A' fruit maturation and ripening study [15], and a new comprehensive study of a bud series from the related tetraploid Actinidia chinensis var. chinensis cv. 'Zesy002' (Voogd et al., in preparation

Gene mining, and comparisons
To identify a complete set of transcription factors a number of approaches were taken, for each approach lists were generated and combined and ltered. In brief, a family was chosen to mine based on the classes identi ed in Arabidopsis [5]. The plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/) which contains 2296 kiwifruit models generated from the original 2013 genome [18] were also assessed. Using an in house database (Bioview) platform, the manually annotated gene models [19] were identi ed containing the appropriate PFam domain [39]. Lastly, gene models identi ed from multiple BLASTP searches using appropriate diverse TFs matches were compared to the manually annotated gene sets. These were all aligned in geneious [40] using MUSCLE alignment [41].
Using the PFam domain to identify the DNA binding domain in the aligned regions, genes without a binding domain were removed, thus creating a complete list of TFs. For the R2R3 MYBs and WRKY family a few genes did not appear to have an Acc model. When this occurred, Web Apollo [42] was used to manually annotate an additional gene model and designated with either the Achn model number or a "no Acc match" in Additional data 1. The DNA binding domains were then aligned using PHYML [43], and named sequentially.

RNA sequencing and transcriptomics
Total kiwifruit RNA was isolated using the Spectrum™ Plant Total RNA Kit (Sigma-Aldrich, St. Louis, MO, USA) and its integrity was assessed using the RNA 6000 Nano kit and the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Three RNA samples per sampling time were used for subsequent library construction and sequencing. The sequencing libraries were constructed according to the TruSeq RNA sample preparation guide (Illumina, San Diego, CA, USA) and subsequently sequenced by a HiSeq 2500 Sequencing System (Illumina), obtaining paired-reads of 125 bp. Library construction and sequencing were performed at Macrogen (www.macrogen.com). A minimum of 16M reads were obtained from the Hort16A tissues and percentage read mapping ranging from 69-92% Additional data 3. The Zys002 bud series had a minimum of 11.9M reads and percentage read mapping was 51-79%. All raw sequence data can be found in the NCBI RNAseq depositories detailed in Additional data 3. The raw reads were aligned using Spliced Transcripts Alignment to a Reference (STAR) (version 2.5.2a) on default settings [44] to the A. chinensis Red 5 v1.69 gene models [19]. Reads were normalised to transcripts per million (TPM).

Development of eFP browser
The eFP browser [1] was generated using deposited code from SourceForge

WGCNA Network analysis
RNAseq data for gene models associated with transcription factors were extracted and transcription factors that were not expressed (based on no read alignments) in any of the samples were removed from the analysis. Weighted gene co-expression network analysis (WGCNA) [30]  Authors' contributions AR harvested and advised on tissues for this experiment, LB undertook the RNAseq preparation and with PM and BW, the RNAseq data processing. BW created the eFP browser, NN, KD, JR, AA and RS mined the TF gene families and undertook phylogenetic alignment, EV, AA and RS supervised LB and BW and conceptualised the study. RS undertook the network analysis and wrote the manuscript. All authors read and edited the manuscript.