Comparative transcriptome analysis identified important genes and regulatory pathways for flower color variation in Paphiopedilum hirsutissimum

Background Paphiopedilum hirsutissimum is a member of Orchidaceae family that is famous for its ornamental value around the globe, it is vulnerable due to over-exploitation and was listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora, which prevents its trade across borders. Variation in flower color that gives rise to different flower patterns is a major trait contributing to its high ornamental value. However, the molecular mechanism underlying color formation in P. hirsutissimum still remains unexplored. In the present study, we exploited natural variation in petal and labellum color of Paphiopedilum plants and used comparative transcriptome analysis as well as pigment measurements to explore the important genes, metabolites and regulatory pathways linked to flower color variation in P. hirsutissimum. Result We observed that reduced anthocyanin and flavonoid contents along with slightly higher carotenoids are responsible for albino flower phenotype. Comparative transcriptome analysis identified 3287 differentially expressed genes (DEGs) among normal and albino labellum, and 3634 DEGs between normal and albino petals. Two genes encoding for flavanone 3-hydroxylase (F3H) and one gene encoding for chalcone synthase (CHS) were strongly downregulated in albino labellum and petals compared to normal flowers. As both F3H and CHS catalyze essentially important steps in anthocyanin biosynthesis pathway, downregulation of these genes is probably leading to albino flower phenotype via down-accumulation of anthocyanins. However, we observed the downregulation of major carotenoid biosynthesis genes including VDE, NCED and ABA2 which was inconsistent with the increased carotenoid accumulation in albino flowers, suggesting that carotenoid accumulation was probably controlled at post-transcriptional or translational level. In addition, we identified several key transcription factors (MYB73, MYB61, bHLH14, bHLH106, MADS-SOC1, AP2/ERF1, ERF26 and ERF87) that may regulate structural genes involved in flower color formation in P. hirsutissimum. Importantly, over-expression of some of these candidate TFs increased anthocyanin accumulation in tobacco leaves which provided important evidence for the role of these TFs in flower color formation probably via regulating key structural genes of the anthocyanin pathway. Conclusion The genes identified here could be potential targets for breeding P. hirsutissimum with different flower color patterns by manipulating the anthocyanin and carotenoid biosynthesis pathways. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03256-3.


Background
Paphiopedilum hirsutissimum is one of the important members of Orchidaceae family having 736 genera. It is the second largest family of flowering plants mainly famous for its aesthetic and ornamental values [8]. Genus Paphiopedilum is known by common name Venus or lady's slipper due the resemblance of their labellum or synsepalum with shoe. The 75 species of this genus are distributed worldwide [33]. Recently, most species of Paphiopedilum have faced a rapid decline, becoming endangered due to its narrow distribution, climate change, habitat loss, and overcollection for their beautiful, unique flowers. They were listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora, which prevents them from being traded across borders [61]. P. hirsutissimum is mainly prevalent in Southern China and India, and is well adapted to cultivation. Flower color in orchids is very fascinating trait, essential for the ornamental value [7]. Floral color variations mainly depend upon pigment formation and the mechanism is well characterized in many plant species [48].
Another important non-chlorophyll pigments that give yellow to red color to plant organs are carotenoids [9]. Apart from their major role in color formation, carotenoids are also an important determinant of nutritional value of fruits and vegetables [9]. The key enzymes involved in multistep metabolic pathway of carotenoid biosynthesis are well characterized in many plant species. Most important structural genes of this pathway codes enzymes including phytoene synthase (PSY), phytoene desaturase (PDS), ζ-carotene desaturase (ZDS), carotene isomerase (CRTISO), lycopene β-cyclase (β-LCY), lycopene ε -cyclase (ε-LCY), zeaxanthin epoxidase (ZEP) and Hyb [4]. Several studies have elucidated the genes encoding key enzymes of the carotenoid biosynthesis pathway [12,53]. Transcriptional regulation of these genes was shown to be involved in carotenoid accumulation in plants [12]. In Oncidium orchids, carotenoids were shown to be involved in yellow color of their lip [10]. However, no detailed study has been conducted to elucidate the genes and molecular mechanism of carotenoid biosynthesis with respect to color formation in P. hirsutissimum.
Besides the structural genes, many transcription factors (TFs) also have prominent roles in determining flower colors by affecting anthocyanin and carotenoid biosynthesis pathways [25,30]. These TFs most often regulate the expression of important structural genes and are referred to as regulatory genes [1,16,70]. These TFs belong mainly to R2R3-MYB, basic helix loop helix (bHLH), MADS box and ethylene response elements (ERF) families [5,14,23,28,54]. The MYB and bHLH TFs act by forming a MYB-bHLH-WD40 protein complex to bind with the promoters of anthocyanin biosynthesis genes to regulate their expression [47,65]. The complex formation between TTG1 (WD40), GL3/EGL3/ TT8 (bHLH), and MYB75/MYB90/MYB113/MYB114 accumulation in tobacco leaves which provided important evidence for the role of these TFs in flower color formation probably via regulating key structural genes of the anthocyanin pathway.

Conclusion:
The genes identified here could be potential targets for breeding P. hirsutissimum with different flower color patterns by manipulating the anthocyanin and carotenoid biosynthesis pathways.
Keywords: Paphiopedilum hirsutissimum, Albino, Anthocyanin, Carotenoids, Flower color, Comparative transcriptome (MYB) to control anthocyanin biosynthesis was demonstrated in Arabidopsis [17,56]. Many MYB TFs were also identified in Orchidacea family such as DhMYB2 in Dendrobium spp. [35,64], OgMYB1 in Oncidium spp. [11], PeMYB2, PeMYB11, PeMYB12 in Phalaenopsis spp. [25]. Any change in the expression of these particular TFs could affect the pattern of flower color [28,54]. Thus, identification of new genes encoding these TFs in P. hirsutissimum will help to understand the molecular basis of flower color variation in orchids.
Transcriptome profiling has contributed significantly in identification of new genes for many important traits [63,69]. It often provides a basis for functional characterization of new genes involved in a variety of functions such as development and stress-responsiveness [63,69]. In this study, we used RNA-seq based transcriptome analysis to identify important genes and TFs involved in flower color variation in P. hirsutissimum. We have identified several structural genes and important TFs regulating key enzymes of anthocyanin and carotenoid biosynthesis pathways. Our study provides a basis for functional characterization of several candidate genes for flower color variation and to understand its mechanism in orchids. The knowledge would be helpful in breeding P. hirsutissimum genotypes with a variety of flower color patterns.

Plant materials and growth conditions
Two variants of P. hirsutissimum are used in the present study. The plant materials are available at the Paphiopedilum Ex Situ Conservation Germplasm Resource Nurseries (22°78′N, 108°27′E) of the Flower Research Institute, Guangxi Academy of Agricultural Sciences, Nanning, China. The color of the petals and labellum were confirmed by the RHS Color Chart. Normal flowers have red-purple petals from tip to center with grey-brown at the base (NP) and grey-brown labellum with brown spots (NL), while its albino variant has light yellow green petals from tip to center with yellow-green part at the base (AP) and yellowish green labellum (AL). Paphiopedilum plants were grown in a greenhouse under natural light at 30/25 °C (day/night) and watered regularly as needed. Floral tissues from above mentioned flower types (NP, NL, AP and AL) were collected in the daytime to avoid any effect on gene expression due to differences in circadian rhythms. The light yellow green part from the tip to the center of the petals of the albino variant, the redpurple part from the top to the center of the petals of the normal species, and all the labellum petals of the normal species and albino species were taken from individual plants and completed three biological replicates. Samples were kept in aluminum foil and put immediately in liquid nitrogen and stored at − 80 °C freezer until use.

Biochemical analysis
Flavonoids, anthocyanins and carotenoids content in the petals and labellums were identified and quantified by using an LC-ESI-MS/MS system [62].

RNA isolation and cDNA library preparation
Total RNA was isolated from collected tissues using RNAprep Pure Plant Kit (Beijing, China) following manufacturer's protocol [69]. RNA integrity and concentration were determined on agarose gel and Nanodrop, respectively. High quality messenger RNA was treated with Oligo (dT) and subjected to fragmentation to create short fragments. After purification of cDNA, polyA tails were added to both ends, ligated with adapters and used for library preparation. A total of 12 cDNA libraries were sequenced using Illumina HiSeq ™ 2000 to obtain short sequences from both ends.

De novo transcriptome assembly and annotation
Raw reads were filtered to get clean reads by removing sequences with adapters, and low-quality transcripts. Due to absence of reference genomes, the clean reads were de novo assembled into transcripts using Trinity (version: v2.9.0) [19]. This assembly has a high degree of completeness with Benchmarking Universal Single-Copy Orthologs (BUSCO) score of 88%, which includes 1267 complete BUSCOs (88%), 1112 complete singlecopy BUSCOs (77.2%), 155 complete duplicated BUSCOs (10.8%), 42 fragmented BUSCOs (2.9%), and 131 missing BUSCOs (9.1%), out of 1440 total BUSCO groups searched. Further details of the number of transcripts in each group are explained in results section. The functions of the unigenes were annotated from NCBI non-redundant protein (NR), Swiss-Prot, Clusters of orthologous groups for eukaryotic complete genomes (KOG), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases using diamond software, and mapped to Pfam databases by HMMER.

Analysis of differentially expressed genes
Gene expression levels were calculated and normalized using mapped reads by the fragments per kilobase of transcript per million mapped reads (FPKM) method using bowtie [58]. DESeq2 software was used to compare the differential expression levels between two samples [6]. Differentially expressed genes (DEG) were identified based on |log2FC| > 1 and the false discovery rate (FDR) < 0.05. GO and KEGG pathway enrichment analysis were performed from significant DEGs using R software.

PCA, correlation and transcription factor analysis
Principal component analysis (PCA) was performed using FPKM values to identify the sample clusters and distribution pattern. Correlation analysis was performed using FPKM values to check the association of data among the replicates and comparing with other samples. PCA and correlation were performed using R studio. Venn diagram and heatmaps were generated using TBtools software.
To identify the DEGs that belong to transcription factor families, unigenes sequences were aligned using BLASTx to the plant transcription factor database (http:// plant tfdb. cbi. pku. edu. cn/ index. php) that includes 58 plant transcription factor families from 165 plant species. Candidates that contained DNA binding domains were recognized by GO annotation for the final TF identification.

Quantitative real time PCR analysis
Around 5 μg of total RNA was reverse transcribed into complementary DNA (cDNA) using HiScript II Q RT Supermix (Vazyme, China). For qPCR reaction, ChamQ SYBR qPCR Master Mix was used in an ABI Prism 7500 sequence detection system [68]. Transcript level normalization was done using the endogenous Actin gene. The relative gene expression was calculated using the 2 -ΔΔCT method described by [39]. Primers were designed using PrimerQuest Tool (http:// sg. idtdna. com/ Prime rquest/ Home/ Index) and the sequences are presented in supplementary Table 1.

Vector construction and genetic transformation in tobacco
Target genes were amplified from P. hirsutissimum cDNA samples using homologous recombination primers with 20-bp 5′ upstream sequence including 14-bp vector homologous sequence followed by 6-bp of restriction enzyme site. The primer sequences are mentioned in supplementary Table S1. The empty vectors of pCAM-BIA2300-GFP plasmid were linearized using double enzyme digestion SacI and XbaI. The amplified fragments of target genes were cloned into linearized vectors of pCAMBIA2300-GFP under CamV35s promoter using NovoRec ® plus One step PCR Cloning Kit (Novoprotein, China). The positive transformants of E. coli were confirmed by PCR followed by sequencing. The recombinant plasmids containing target gene were then transformed into Agrobacterium tumefacien using electroporation method and single colonies were selected on LB plates with kanamycin antibiotic. The A. tumefacien containing recombinant plasmids were transformed into tobacco leaves using needle-less syringe and kept in dark at room temperature until further analysis.

Statistical analysis
Data was analyzed using R language. All assays were performed in triplicate. Mean values were compared by analysis of variance (ANOVA) combined with Duncan's multiple range tests.
Normal flowers have red-purple petals from tip to center with grey-brown at the base (NP) and grey-brown labellum with brown spots (NL), while its albino variant has light yellow green petals from tip to center with yellow-green part at the base (AP) and yellowish green labellum (AL).

Phenotypic and biochemical characterization of P. hirsutissimum flowers
Two flower types of P. hirsutissimum were selected in the present study. Normal flower shows red-purple petals from tip to center with grey-brown at the base (NP) and grey-brown labellum with brown spots (NL) but its albino variant shows light yellow green petals from tip to center with yellow-green part at the base (AP) and yellowish-green labellum without purple spots (AL) (Fig. 1a,b). To understand the biochemical basis of flower color variation in P. hirsutissimum, we measured anthocyanins, carotenoids and flavonoid contents. The biochemical profile showed a considerable decrease in total anthocyanins in AL and AP compared to NL and NP, respectively (Fig. 1c). Similarly, total flavonoids were also reduced in AL and AP compared to NL and NP, respectively (Fig. 1d). In contrast, we observed a slight increase in total carotenoids in AL and AP compared to NL and NP, respectively (Fig. 1e). These findings suggest a possible positive correlation of anthocyanins and flavonoids with normal flower phenotype, and that of carotenoids with albino phenotype. In other words, reduced anthocyanin and flavonoid contents along with slightly higher carotenoids are responsible for albino flower phenotype (AL and AP).

Analysis of transcriptome dataset
To understand the molecular basis of flower color variation and differential anthocyanin and carotenoid contents, transcriptome analysis of normal (NL and NP) and albino (AL and AP) flower tissues was performed. Using 3 biological replicates for each sample, a total of 12 cDNA libraries were developed and used for transcriptome analysis (RNA-seq). A total of 18,236,750-21,697,775 reads were obtained which were de novo assembled into transcripts. Transcript assembly led to the identification of 34,806 transcripts in normal samples (NL and NP) with an N50 score of 1194 kb. For albino samples (AL and AP), 28,805 transcripts were obtained with the N50 score of 1341 kb. Benchmarking Universal Single-Copy Orthologs (BUSCO) program revealed that the integrity of transcripts of normal samples (NL and NP) was 83.1% with redundancy rate of 6.6%, while integrity for albino samples (AL and AP) was 84.7% with 8.9% redundancy rate. This shows a high quality of transcriptome dataset to study further expression profiles of samples. The gene expression levels of unigenes were determined in term of fragments per kilobase of exon per million fragments mapped (FPKM) values. Principal component analysis (PCA) based on FPKM values divided the samples in 4 distinct groups with each sample making a separate group with its replicates (Fig. 2a). Samples from normal flower (NP and NL) fell on the positive side of X-axis while samples from albino flower (AP and AL) fell on the negative side of X-axis. PCA analysis further showed that major portion of total variation (70.24%) consists of two principal components (PC1: 39.4%, PC2: 30.84%). Correlation analysis indicated a significant positive correlation among the 3 replicates of each sample, while a low level of correlation was observed with other samples (Fig. 2b,c). This suggests a consistent gene expression levels among all the biological replicates and proves the reliability of expression analysis.

Validation of RNA-seq dataset using qRT-PCR
Before proceeding for further analysis of transcriptome data, we attempted to validate the expression trend using qRT-PCR analysis. To do this, we have selected 9 DEGs from transcriptome dataset showing variable expression and performed qRT-PCR analysis (Fig. 3). According to qRT-PCR analysis, all genes have shown an expression trend similar to the transcriptome dataset, which indicates that the quality of our transcriptome data is good enough to produce reproducible results and is worthy to proceed for further downstream analysis.

Comparison of DEGs among the normal and albino tissues
To identify the differentially expressed genes (DEGs) between the normal and albino samples, we performed a tissue-wise comparison of DEGs by dividing into two groups; AL vs NL (labellum tissue) and AP vs NP (petal tissue). Comparative analysis identified 3287 DEGs (1713 up-regulated and 1574 down-regulated genes) between AL vs NL (Table S2, Fig. 4a), and 3634 DEGs (1822 upregulated and 1812 down-regulated genes) between AP vs NP (Table S3, Fig. 4a). Next, we identified the shared and specific number of DEGs between labellum and petal tissues. Venn diagram indicated that 1379 genes were expressed only in AL vs NL; 1726 genes were expressed only in AP vs NP; while 1908 DEGs were common between both groups (Fig. 4b). Heatmaps along with hierarchical clustering showed distinct patterns of gene expression among the normal and albino tissues (Fig. 4c). The expression pattern between the AL and AP were  S1a). Similarly, 4 DEGs were solely expressed in the normal labellum tissues and not expressed in any other tissue ( Supplementary Fig. S1b). This suggests the key role of these 39 and 4 DEGs in color formation in petal and labellum tissues.

Gene ontology and KEGG enrichment
We performed GO enrichment analysis and revealed the significantly enriched biological pathways between AL vs NL and AP vs NP (Supplementary Fig. S2). The most enriched GO terms between AL vs NL were photosynthesis, response to auxin, response to oxidative stress, and cell wall modification. The most enriched GO terms between AP vs NP were DNA metabolic process and regulation of gene expression. Since albino flower has more green color than normal flower, it may have more prolonged photosynthesis as shown in GO enriched term. The KEGG enrichment analysis for AL vs NL showed the consist results as GO enrichment with photosynthesis, plant-hormone signal transduction and phenylpropanoid as most enriched biological pathways ( Supplementary  Fig. S3a). Plant-hormone signal transduction, plantpathogen interaction and phenylpropanoid biosynthesis were revealed as most enriched pathways between AP vs NP ( Supplementary Fig. S3b). These findings suggest the involvement of photosynthetic pigments and phytohormones in flower color variation in Paphiopedilum.

DEGs involved in anthocyanin biosynthesis pathway in Paphiopedilum flowers
To further explore the mechanism of flower color formation in P. hirsutissimum, a more detailed analysis of genes involved in color related pathways is required. To study the differential regulation of anthocyanin biosynthesis in P. hirsutissimum flowers, we mapped the DEGs related to anthocyanin biosynthesis pathway. Ten genes were found to be differentially expressed between AL vs NL and AP vs NP at different steps of anthocyanin biosynthesis pathway (Fig. 5)  petal tissues of albino flower (Fig. 5). Together, these results identified important genes encoding key enzymes involved in anthocyanin biosynthesis which are possibly playing role in determining flower color variation in P. hirsutissimum.

DEGs encoding the key transcription factors related to color formation
Some transcription factor (TF) families including MYB, bHLH, MADS-Box and ERF play important roles in color formation via anthocyanin biosynthesis by regulating the expression of key structural genes. We therefore analyzed the expression pattern of these TF families in AL vs NL and AP vs NP tissues (Fig. 7). Twenty-five MYB TF encoding genes were differentially expressed among normal and albino flowers (Fig. 7a). Of these 25 genes, 14 were differentially expressed only in labellum tissues, 4 were specific to petal tissues, and 7 DEGs were differentially expressed in both labellum and petal tissues (Fig. 7a). In labellum tissue, 12 MYB genes were downregulated while 9 MYB genes were upregulated in albino flower. In petal tissue, 8 MYB genes were downregulated while 3 MYB genes were upregulated in albino flower. This could potentially affect the expression of structural genes related to color formation in albino mutant leading to loss of red-purple spots in albino flowers. Nine genes encoding bHLH TFs were found to be differentially expressed in albino vs normal flowers (Fig. 7b). Of these 9 genes, 3 were downregulated while 3 were upregulated in AL vs NL, while 4 bHLH genes were downregulated and 2 were upregulated in AP vs NP. Four MADS-Box genes were identified, of which 2 were downregulated in both labellum and petal tissues, while 2 were upregulated only in petal tissues (Fig. 7c). Lastly, we identified 17 DEGs encoding ERF and AP2/ERF TFs, and most of which were downregulated in AL and AP compared to NL and NP, respectively (Fig. 7d). For the ERF TFs, 6 were downregulated in AL vs NL, and 13 were downregulated in AP vs NP. However, 1 ERF gene was upregulated in AL, and 2 genes were upregulated in AP compared to NL and NP. Collectively, strong downregulation of MYB, bHLH, MADS-Box and ERF TFs may reduce the expression of target structural genes involved in flower color formation, leading to the albino mutant phenotype in P. hirsutissimum.

Overexpression of transcription factors increased anthocyanin accumulation
To get a deeper insight into the role of TFs in color formation, we first validated the expression of eight TFs (3 MYB, 2 bHLH, 2 ERF and 1 MADS-Box) with contrasting expression patterns in our RNA-seq data using qRT-PCR analysis. All the TFs encoding genes showed more or less similar expression trend in our qRT-PCR data as observed in RNA-seq data (Fig. 8A). This gave us confidence to move further for the functional validation of some TFs. To see if these TFs could affect the flower color formation, we transiently overexpressed 5 TFs into tobacco (Nicotiana benthamiana) leaves and measured the anthocyanin accumulation, the major regulator of Fig. 8 qRT-PCR and overexpression of candidate transcription factors. A qRT-PCR analysis of candidate transcription factors from Fig. 7 for validation of transcriptome data. Data is mean of 3 biological replicates with ± SD. B Anthocyanin content in tobacco leaves overexpressing candidate transcription factors. Ck = control without overexpression. Data is mean of 3 biological replicates with ± SD. ** shows significant difference at P < 0.01; ns = non-significant flower color. 4 out of 5 TFs belonging to each of the four tested TFs families significantly increased the anthocyanin accumulation in tobacco leaf compared to the control leaves (Fig. 8B). This showed the great potential of identification and utilization of potential TFs of these families in regulating flower color variation in P. hirsutissimum.

Discussion
Flower industry contributes significantly to the economy of several countries such as Japan [3], Netherlands, Columbia, Ecuador (https:// www. petal repub lic. com/ flori stry-and-flori cultu re-stati stics/) and Kenya (http:// www. kenya rep-jp. com/ busin ess/ flower_ e. html). P. hirsutissimum belongs to the family of orchids and have ornamental plants with unique flower patterns [25]. Variation in flower colors is an important trait considered for breeding new varieties of ornamental plants [3,31]. Understanding the mechanism of color variation could help in breeding desirable genotypes and ultimately uplift the flower industry [3]. In this study, we attempted to identify the key players of flower color formation in P. hirsutissimum plants by using natural variation in flower color. The wild-type flowers of P. hirsutissimum have pink-rose petals and yellow labellum with purple spots which are very attractive to people and give a pleasant feeling. Its natural albino variant shows off-white petals with yellowish-green labellum that provides a good source of study the mechanism of flower color variation. In accordance with the albino phenotype, we observed reduced anthocyanins and increased carotenoid contents in mutant (Fig. 1). To uncover the molecular basis of flower color variation, we used the transcriptome analysis to identify important structural and regulatory genes and highlighted key pathways regulating the flower color variation in P. hirsutissimum.

Critical pathways involved in regulation of flower color variation
Anthocyanin biosynthesis is a major pathway involved in flower and fruit color formation in orchids and other flowering plants [13]. The key enzymes of anthocyanin biosynthesis pathway are PAL, C4H, 4CL, CHS, CHI, F3H, F30H, FLS, DFR, ANS, and UFGT [13]. We identified 10 DEGs for anthocyanin biosynthesis among the normal and albino flower tissues encoding for the CHS, CHI, F3H, FLS and DFR (Fig. 5). Studies have shown that differential expression of genes encoding these key enzymes (CHS, CHI, F3H, FLS and DFR) play critical role in anthocyanin accumulation and affect the flower color phenotype [13,21,27,72]. Importantly, we observed strong downregulation of CHS encoding gene in albino labellum. As CHS catalyzes the formation of Naringenin chalcone, an important substrate of anthocyanin pathway [66], the downregulation of CHS gene is probably leading to reduced anthocyanins by providing the low amount of substrate ultimately leading to the albino labellum phenotype [13]. The importance of CHS genes in flower pigmentation has been reported in different studies and proved by transgenic approaches [43,51]. For example, transformation of antisense cDNA of CHS gene inhibited the flower color formation in petunia and caused white flower phenotype [59]. CHI is another important enzyme that converts Naringenin chalcone to Naringenin, a precursor molecule for several flavanols [32]. Gene encoding CHI was downregulated in the normal colored petals compared to albino petal (Fig. 5). As studies have shown the positive correlation of anthocyanin accumulation with CHI [11,72], the downregulation of CHI encoding gene in colored petals could be due to the feedback regulation [30]. Similarly, we observed strong downregulation of two F3H(CYP75B1) encoding genes in the albino labellum and petals (Fig. 5). F3H genes have been shown to strongly associated with red color formation in flower petals and leaves in ornamental plants [26]. These F3H genes are often activated by the MYB TFs which bind to the promoters of F3H genes at conserved motifs [57]. Being the major structural gene in anthocyanin biosynthesis pathway, the downregulation of F3H encoding genes may have affected the overall anthocyanin accumulation in P. hirsutissimum flowers [21,30], contributing to the loss of red-purple color in petals and labellum.
In addition to the anthocyanins, carotenoids also contribute in the color pigmentation in flowers and mainly confer yellow color [9,40]. Here, we identified 8 DEGs encoding the key enzymes of carotenoid biosynthesis pathway (Fig. 6). Strigolactone is an important phytohormone derived from carotenoids and function in various plant developmental processes [29,45]. Genes encoding CCD8 and DWARF27 were strongly upregulated in albino flower tissues (Fig. 6). CCD genes not only play role in strigolactone synthesis, but also play important roles in leaf senescence and flower development [52]. Thus, strong upregulation of CCD8 and DWARF27 genes in P. hirsutissimum flowers may play an important role in flower color variation which is not yet known. Here it is important to note that DWARF27, CCD8 and LUT1 regulate the formation of byproducts of carotenoid pathway, and upregulation of genes encoding these enzymes could be the compensatory response due to downregulation of other genes of carotenoid pathway [69]. VDE is an important enzyme that catalyzes the formation of violaxanthin [37], and we observed a significant downregulation of VDE gene in the albino labellum. NCED is another key enzyme in the ABA synthesis that catalyzes formation of xanthoxin, the precursor of ABA [22,46]. Xanthoxin is then used as a substrate for the synthesis of ABA via an intermediate step by the action of ABA2 enzyme [18]. ABA is known to play role in diverse metabolic processes in plants from stress responses to plant development [22,38,74]. Its role in flower development has also been demonstrated [2]. Genes encoding these key enzymes NCED (TRINITY_DN60125_c0_g1_i1|m.28132) and ABA2 (TRINITY_DN52666_c0_g2_i1|m.83328) were downregulated in albino petals and labellum, respectively (Fig. 6). Although, the increased carotenoid level in mutants is in accordance with the albino phenotype, the reduced expression of several carotenoid pathway genes could be due to negative feedback regulation and these genes may be playing role at post-transcriptional or translational level [41]. This is quite possible because the studies have found a very weak correlation of gene expression with protein abundance for many genes [60].

Transcriptional regulation of flower color variation
Transcription factors play a significant role in the regulation of flower color in Paphiopedilum species [25]. Among various TF families, MYB and bHLH are the most significant regulators of flower color variation. These TFs work in different ways to regulate flower color variation. They can either regulate the expression of other key genes of anthocyanin and carotenoid biosynthesis [25], or make a complex with other TFs to regulate level of anthocyanin and carotenoids [30]. Three MYB TFs PeMYB2, PeMYB11, and PeMYB12 activate the expression of major anthocyanin biosynthetic genes PeF3H5, PeDFR1, and PeANS3 causing the red pigmentation in flowers of Phalaenopsis spp. [25]. Silencing of these MYB TFs resulted in loss of red color in flowers along with reduced levels of anthocyanins in Phalaenopsis spp. In addition, an R2R3-MYB TF called Reduced Carotenoid Pigmentation 1 (RCP1) positively regulate the expression of carotenoid biosynthesis genes and affect carotenoid accumulation and flower color in Mimulus lewisii [49]. The loss-of-function mutant showed reduced carotenoid level and its overexpression restored the carotenoid production and complemented the flower phenotype [49]. Importantly, many MYB TFs contain the [D/E]Lx2[R/K] x3Lx6Lx3R motif that is predicted to interact with bHLH factors [25]. Several studies reported that MYB, bHLH and WD40 proteins make a complex and control flower color by activating the anthocyanin and carotenoid biosynthesis genes [30,65]. Recently, an R2R3-MYB protein, WHITE PETAL1 (WP1), was shown to play a critical role in regulating floral carotenoid pigmentation in Medicago truncatula [42]. This WP1 TF activates the expression of carotenoid biosynthesis genes by interacting with bHLH protein MtTT8 and WDR protein MtWD40-1 [42]. In the present study, we identified 25 MYB and 9 bHLH TF encoding genes that are differentially expressed in albino and colored flowers of Paphiopedilum (Fig. 7). It is very likely that these MYB and bHLH TFs regulate the genes of anthocyanin pathway either directly or by making a complex, thereby regulating albino flower phenotype in P. hirsutissimum. Previously, the function of MYB genes was tested using VIGS approach by silencing some MYB genes [25]. However, with the availability of highly efficient genome editing approach, these MYB and bHLH genes could be functionally characterized using CRISPR/ Cas9 mediated targeting of cis-elements or functional domains [70]. In addition to MYB and bHLH, MADS-Box and AP2/ERF TFs were also known to affect anthocyanin and carotenoid biosynthesis [5,28]. In apple, an ERF TF MdERF38 positively regulates anthocyanin accumulation by interacting with MdMYB1 [5]. In Bilberry, expression of a MADS box transcription factor VmTDR4 was shown to regulate anthocyanin accumulation and fruit color by interacting with MYB TFs [28]. Importantly, some MADS-box TFs such as SIMADS1 negatively regulate carotenoid biosynthesis by downregulating the expression of carotenoid pathway genes [53]. Here, we identified 4 MADS-Box and 17 ERF or AP2/ ERF TFs that were differentially expressed in P. hirsutissimum flowers (Fig. 7). Notably, MADS6 expression was strongly upregulated in albino petals which may be the reason of reduced expression of carotenoid biosynthesis genes. To provide another line of evidence, we overexpressed 5 TFs (at least one from each of the MYB, bHLH, ERF and MADS-Box family) in N. benthamiana, and 4 out of 5 TFs caused significant increase in anthocyanin accumulation (Fig. 8). This is probably due to the activation of key structural genes of anthocyanin pathway by these TFs as we observed a MYB-binding site in the promoter sequence of one of the F3H DEGs involved in catalyzing important step of anthocyanin biosynthesis pathway [21,50]. Thus, the differential expression of TFencoding genes is likely to be associated with flower color variation by modulating structural genes [11]. These TF encoding genes could be utilized via over-expression or silencing approaches to breed P. hirsutissimum plants with different flower colors [28].

Conclusions
In this study, we used comparative transcriptome and biochemical analysis to explore the important genes and regulatory pathways linked to flower color variation in P. hirsutissimum. We found that downregulation of key anthocyanin biosynthesis genes F3H and CHS is probably leading to albino flower phenotype via downaccumulation of anthocyanins. Though, as per expectation, we observed an increased carotenoid accumulation in albino flowers, the expression of important carotenoid biosynthesis genes was downregulated suggesting that carotenoid accumulation was probably controlled at post-transcriptional or translational level. Lastly, we identified several key TFs (MYB73, MYB61, bHLH14, bHLH106, MADS-SOC1, AP2/ERF1, ERF26 and ERF87) that are probably regulating the expression of important structural genes. Overexpression of 4 out of 5 TFs significantly increased the anthocyanin accumulation in tobacco which provided important evidence of the role of these TFs in flower color formation. Further functional characterization of these TFs via overexpression, knockout and protein-DNA interaction approaches could further improve our understanding of the mechanism of their action and flower color formation.