- Research article
- Open Access
Prediction and functional analysis of the sweet orange protein-protein interaction network
BMC Plant Biology volume 14, Article number: 213 (2014)
Sweet orange (Citrus sinensis) is one of the most important fruits world-wide. Because it is a woody plant with a long growth cycle, genetic studies of sweet orange are lagging behind those of other species.
In this analysis, we employed ortholog identification and domain combination methods to predict the protein-protein interaction (PPI) network for sweet orange. The K-nearest neighbors (KNN) classification method was used to verify and filter the network. The final predicted PPI network, CitrusNet, contained 8,195 proteins with 124,491 interactions. The quality of CitrusNet was evaluated using gene ontology (GO) and Mapman annotations, which confirmed the reliability of the network. In addition, we calculated the expression difference of interacting genes (EDI) in CitrusNet using RNA-seq data from four sweet orange tissues, and also analyzed the EDI distribution and variation in different sub-networks.
Gene expression in CitrusNet has significant modular features. Target of rapamycin (TOR) protein served as the central node of the hormone-signaling sub-network. All evidence supported the idea that TOR can integrate various hormone signals and affect plant growth. CitrusNet provides valuable resources for the study of biological functions in sweet orange.
Protein-protein interactions (PPIs) are involved in almost all aspects of cellular processes. Understanding the interactions between proteins is an important goal of systems biology, and such knowledge can provide crucial insights into protein function and molecular mechanisms. Various experimental technologies, such as affinity purification mass spectrometry (AP-MS) , the yeast two-hybrid (Y2H) system , and protein arrays – have been applied to detection of genome-wide PPIs in many model species, including Homo sapiens, Drosophila melanogaster, Saccharomyces cerevisiae, and Caenorhabditis elegans. However, technical problems limit the efficacy of some of the aforementioned high-throughput screening methods. For example, the PPIs identified to date represent only a small fraction of the full PPI network in the relevant species. Furthermore, the techniques used to identify PPIs are labor-intensive and costly, further limiting the application of these methods. In addition, most experimental methods have condition- or method-specific features; consequently, the data obtained by various methods sometimes exhibit minimal overlap even within the same species.
To overcome the disadvantages of existing experimental methods and expand the coverage of PPIs, various bioinformatics approaches have been developed, including ortholog- and domain-based methods, gene neighbor methods, and gene fusion methods –. Various types of data are used to predict PPIs, including protein sequences and functions, protein structures, gene expression, etc. Many public databases have been constructed to store the experimental and theoretical predicted PPIs, including the Database of Interacting Proteins (DIP) , Molecular INTeraction database (MINT) , protein InterAction database (IntAct) , Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) , Biological General Repository for Interaction Datasets (BioGRID) , and the Human Protein Reference Database (HPRD) .
Currently, the comprehensive determination of protein interactomes in plant species is lagging behind analogous efforts in model animals. Using Y2H, Dreze and colleagues generated a highly reliable binary PPI network of Arabidopsis thaliana containing about 6,200 interactions among 2,700 proteins . To date, computational prediction of PPIs has only focused on a few model plant species, such as A. thaliana– and Oryza sativa. Research on protein interactomes in perennial woody plants is lacking, in part due to limited genetic studies in these species.
Citrus is commercially cultivated in over 130 countries and districts, and citrus fruit holds a dominant position in the global fruit industry (http://faostat.fao.org). Although many germplasms and cultivars exist, comprehensive morphological, anatomical, and molecular evidences has confirmed that all modern cultivars (such as sweet orange) are the derived offspring of two primitive species, pummel (C. grandis) and mandarin (C. reticulata) ,. Genomic evidence, including a large body of genetic information from Citrus, has demonstrated that sweet orange is a natural hybrid of pummel and mandarin ,. Sweet orange, which is consumed fresh and as juice, contains large amounts of vitamin C and other compounds beneficial to human health; this species accounts for more than 60% of world fruit production (http://faostat.fao.org). Thus, due to its economic and genetic importance, sweet orange is an important woody model plant. The draft genome of this species provides an unprecedented opportunity to investigate its genetics, biochemistry, and evolution , and also provides a good opportunity to study protein-protein interactions in Citrus.
In this analysis, we predicted the sweet orange PPI network using ortholog-based and domain-based interaction methods, based on careful preparation and selection of reliable data resources, and then applied K-nearest neighbors (KNN) method to verify and filter the predicted PPIs and obtain a more confident network. The final PPI network in sweet orange, CitrusNet, contains 8,195 proteins and 124,491 interactions. This is the first report of a genome-wide PPI network in a woody plant. The quality of this PPIs network was carefully assessed. Furthermore, using RNA-seq data from four tissues, we analyzed important functional modules and obtained insights into the tissue-specific features of the sweet orange protein interactome.
Topological properties of the PPI network
Based on a combination of ortholog- and domain-based prediction methods, we obtained 146,056 PPIs, and then assessed and filtered all the predicted PPIs by KNN. The final CitrusNet contained a total of 8,195 proteins and 124,491 interactions (Figure 1a). KNN verified 85.2% of the predicted PPIs, similar to the rate in six other model organisms (93.3%), and much higher than that of random PPIs (13.4%, Figure 1b), indicating that CitrusNet is highly accurate. When we used Cytoscape to visualize and analyze the topological properties of CitrusNet , 7,885 of the total 8,195 proteins were connected into a large interconnected sub-network. The topological properties indicated that CitrusNet was a scale-free network, i.e., most nodes had low degrees of connection, whereas a few hub nodes had very high degrees of connection. In total, 602 proteins in CitrusNet had more than 100 connections. Table 1 lists the top ten hub nodes, including eight heat shock-related proteins, one polyubiquitin A protein, and one DNA-directed RNA polymerase II subunit protein. These data indicate that heat shock proteins play important roles in the citrus interactome. The degrees of the nodes in CitrusNet follows a power-law distribution fit by the regression equation y = 4885.8*x-1.407 (r2 = 0.851, P < 0.0001, Figure 1c), where x indicates degree and y indicates the number of nodes with a given degree.
Nodes with a high betweenness centrality (BC, which counts the fraction of shortest paths going through a node) are also very important for a network, because these nodes act as bottlenecks in the network even in the absence of hubs . Table 2 lists the ten nodes with the largest BC in CitrusNet. Five of these nodes are also among the ten hub nodes with the highest degrees (Cs4g11150.3, Cs4g08220.1, Cs4g08220.3, Cs9g19220.1 and Cs5g03150.1); four of those are heat shock-related proteins, and the other is polyubiquitin A protein. Thus, all of these proteins play critical roles in the connection and communication among nodes in CitrusNet. Based on the combination of degree and BC analysis, we speculate that heat shock proteins are the most important proteins in the sweet orange PPI network.
The clustering coefficient (CC, a measure of degree to which nodes in a graph tend to cluster together) of CitrusNet is 0.301, much higher than that of a random network with the same degree distribution (CC = 0.014). This feature indicated that CitrusNet is very intensive, and several sub-networks containing protein complexes or signaling pathways could be identified. The relationship between CC and degree in CitrusNet is shown in Figure 1d. From the distribution map, we observed that nodes with small degrees were intensively distributed, whereas nodes with large degrees were very sparse, further confirming the scale-free property of CitrusNet.
Quality assessment of CitrusNet
Because no experimental PPIs are available in sweet orange, we employed gene ontology (GO) annotation analysis to evaluate CitrusNet. Because interacting proteins tend to have similar or related functions, a reliable PPI network should contain many more functionally related proteins than a random network. We performed GO analysis to compare the functional relationships among protein nodes in CitrusNet and random networks, and then obtained GO annotation from the GO database . In total, 27,943 proteins in sweet orange had GO annotations, and 7,327 of these (constituting 109,409 interactions) were included in CitrusNet. For these comparisons, we calculated the shortest distance of GO terms for protein pairs in CitrusNet and five random networks. The number of proteins and protein pairs in random networks were the same as in CitrusNet, it is worth noting, however, that the protein pairs in random networks did not interact. Figure 2a shows that the shortest distance of GO terms in CitrusNet was much shorter than the average distance in five random networks.
Moreover, we analyzed many gene modules in CitrusNet to determine whether they contained known biological pathways in sweet orange. To this end, we used Mapman annotation, which represents biological functions with no redundancy between different terms . We constructed a Mapman network for CitrusNet, and employed the clustering algorithm described in reference  to extract relevant modules. Figure 2b shows that the relevant Mapman terms were highly correlated, whereas terms with no functional relationships had low correlation. Detailed analysis of the RNA transcription sub-network illustrated that PPIs provide useful biological information (Figure 2c). In the RNA transcription sub-network, terms related to RNA transcription and global transcription factors exhibited complicated interactions (Figure 2c). Furthermore, terms related to chromatin remodeling factors, histone deacetylase activity, and histone acetyltransferase activity had many connections with the RNA transcription sub-network, suggesting that chromatin remodeling and histone acetylation/deacetylation might be closely related to the transcription process. The Mapman annotation network indicated that CitrusNet contained a considerable number of biological modules, and the clustering algorithm successfully grouped terms that were previously classified as multimeric complexes.
Modular features of protein-interaction networks revealed by RNA-seq data analysis
The relationship between gene expression and protein interactions has been comprehensively analyzed ,. In general, genes that encode interacting proteins are correlated at the level of mRNA expression ; therefore, correlations in gene expression should reflect the reliability of the predicted PPI network. Based on RNA-seq data from four sweet orange tissues (callus, leaf, flower, and fruit), we proposed an index, the expression difference of interacting protein pairs (EDI). Interacting genes with similar expression levels had EDI values closer to zero. We employed the Markov Clustering (MCL)-modular method  to extract relevant modules from CitrusNet. MCL is a clustering method for extracting relevant modules from the constructed interactome, and is superior to other clustering methods under most conditions . We compared the EDI distributions of four CitrusNet-related networks, including CitrusNet, a random network, a MCL-modular network and a network inferred from yeast protein complexes . In total, we obtained 159 protein complexes in sweet orange by using the best hits from sequence alignments with the yeast complexes (Additional file 1). Figure 3a shows the EDI distribution of these four types of networks. The protein complex network had the highest percentage of genes with EDI around zero, suggesting that protein complexes have the highest proportions of genes with similar expression levels. The MCL-modular network had a higher percentage of similarly expressed genes than CitrusNet, whereas the random network had the lowest percentage of EDI around zero, indicating that the MCL algorithm was highly effective at extracting relevant modules, and that the predicted CitrusNet was reliable. Currently, no protein complexes have been experimentally determined in sweet orange, therefore, the predicted 159 protein complexes in this analysis are very helpful for related studies.
We also analyzed the EDI distribution in CitrusNet based on its topological properties. First, we divided all genes in the network into two groups (genes with degree above ten were defined as high-degree genes, and other genes were defined as low-degree genes), and then divided the high-degree genes into two subgroups (high BC and low BC, respectively). Figure 3b illustrates that the percentage of low-degree genes with EDI around zero was much higher than the percentage of high-degree genes, because the high-degree genes had many interacting partners with different functions and expression levels, whereas low-degree genes had far fewer interacting partners whose functions and expression levels were more likely to be similar. This phenomenon reflects the ‘economical’ principle of gene interactions, i.e., genes tend to interact with as few partners as possible to serve their biological functions. We also observed the same features for EDI and BC topological parameters. The percentage of high-degree/low-BC genes with EDI around zero was higher than that of high-degree/high-BC genes, suggesting that genes with low BC are likely to have similar expression levels.
In addition to studying the EDI distribution in different kinds of sub-networks, we also analyzed EDI variations among genes in the four tissues. We performed an in silico analysis to simulate the effect of removing specific nodes in the network. We defined different thresholds of EDI variation values from 10 to 100, with a step size of 1. Based on these thresholds, we removed nodes with EDI variation values higher than the thresholds in CitrusNet, and compared the average shortest path length after randomly removing the same number of nodes in CitrusNet. The shortest path length between two nodes reflects the overall network connectivity. In total, we removed 1658 nodes for EDI variation = 10, and 117 nodes for EDI variation = 100. The distributions of average shortest path lengths of these two types of interactome networks are shown in Figure 3c. The average shortest paths of EDI variation -removed networks were all longer than those of randomly-removed networks, indicating that nodes with high EDI variation values were very important for network connectivity. Figure 3c also shows that nodes with EDI variation values in the range 10–20 were crucial for the stability of the network, and that removing these nodes resulted in the largest shortest path lengths.
Finally, we analyzed the modular features of gene expression in CitrusNet. Figure 3d illustrates a sub-network comprising highly expressed genes in at least three tissues (Additional file 2). Genes with similar functions were clustered into modules. For example, genes related to protein degradation and ubiquitin (blue), DNA synthesis and chromatin structure (red), cell cycle and cell division (grass green), protein synthesis (orange), and metabolite transport (purple), are clustered into functional groups. The EDI values of genes in these modules were smaller than those of genes with different functions. Therefore, EDI value reflects the function relationships between interacting gene pairs. Based on this comprehensive analysis, we concluded that gene expression patterns in CitrusNet have modular features.
Analysis of the plant hormone-related interaction network
Plant hormones such as ethylene, abscisic acid, auxins, gibberellins, cytokinins and brassinosteroids, play important roles in plant biology. To identify important hormone-signaling proteins in sweet orange, we constructed PPI sub-networks related to hormone-signaling pathways, based on information from the Arabidopsis Hormone Database (AHD2.0) . These sub-networks were simplified by filtering proteins involved in basic biological process. In the hormone-signaling sub-network, proteins related to ethylene, jasmonic acid, salicylic acid, auxins, brassinosteroids and cytokinins formed a highly interconnected network, with target of rapamycin (TOR) as its center (Figure 4). In this sub-network, TOR had the highest BC, and therefore served as the hub protein. TOR is a conserved protein with similar functions in yeast, human and Arabidopsis,. Although TOR has not been extensively studied in plants, experimental evidence in Arabidopsis has shown that this protein functions as a hub involved in the regulation of growth, stress resistance, and mRNA translation . Proteins related to hormone signaling and hormone receptors do not interact with TOR directly, but through other connecting proteins. As shown in Figure 4, protein phosphatase 2A (PP2A) regulatory subunit and catalytic subunit both had direct interactions with TOR, and the catalytic subunit also interacted with the Raf-like protein kinase CTR1, which is a negative regulator of the ethylene response pathway in Arabidopsis. Furthermore, jasmonic and salicylic acid signaling proteins had many interactions with each other, most of these interactions were mediated by mitogen-activated protein kinases (MPK4, MMK1), and many of them also interacted with mitogen-activated protein kinase-kinase MKK2 and MKK6, which directly interacted with TOR. Three different link paths connected brassinosteroid signaling proteins and TOR. Shaggy-related protein iota/kinase eta had similar interaction patterns, both interacted with TOR through protein phosphatase PP1 and phosphatidylinositol 3-/4- kinase (TRRAP). Brassinosteroid insensitive-1 (BRI1) and the downstream protein BSU1 connected with TOR through 14-3-3 protein. Six auxinic compound transporters directly interacted with TOR; therefore, we speculate that TOR regulates auxin signal transduction.
The links among different hormone signaling proteins were very complex, because several of the interacting proteins (PP2A, protein phosphatase PP1, TRRAP, 14-3-3 protein, and MAP kinase) were connected to a considerable degree. Many of the hormone-related proteins interacted with each other, indicating that various hormone-signaling pathways exchange information before affecting TOR (or vice versa). Although many plant hormone-related proteins were connected, TOR is undoubtedly the hub of hormone crosstalk. The central role of TOR implies that it integrates environmental information communicated by hormones, and uses this information to regulate plant growth.
In this study, we predicted the genome-wide PPI network of sweet orange, based on ortholog identification and domain-combination methods. Because there is no golden-standard PPI dataset in sweet orange, we employed a highly accurate KNN algorithm to filter the predicted interactions. The resultant PPI network, CitrusNet, contains 8,195 proteins with 124,491 interactions. We then employed GO and Mapman annotation to assess the predicted network. The shortest distance of GO terms in CitrusNet was much smaller than that of random networks, and the MapMan biology network demonstrated that gene modules in CitrusNet reflected known biological pathways in sweet orange. We also predicted 159 protein complexes in sweet orange using orthologs of the yeast protein complexes and employed them to assess CitrusNet; the results revealed that protein complexes had relatively tight connections. Together, these analyses confirm that CitrusNet is highly reliable.
We also analyzed the modular features of PPI networks based on the connections in CitrusNet. We proposed the EDI index to analyze the relationship between gene expression and protein interactions. The percentage of protein nodes with similar gene-expression levels in CitrusNet-derived sub-networks was much higher than those of random networks, especially in protein-complex and MCL-modular networks. Protein complexes are groups of two or more associated polypeptide chains, which are predicted to have tight network connections and to be encoded by genes that are expressed at similar levels. Analysis of the EDI distribution, based on topological properties, revealed that genes interacted with as few partners as possible (the ‘economical’ principle). Furthermore, EDI values around zero were negatively correlated with topological parameters such as degree and BC. In addition to analyzing EDI distributions, we also compared EDI variation among four tissues and found that genes with low EDI variation values were important for the stability of CitrusNet.
Protein interactions play central roles in many different signal-transduction processes, including protein structure transformation, protein phosphorylation and dephosphorylation, and hormone signal processing. Many genes and biological processes regulated by hormone signals have been identified , and a growing number of studies have focused on the crosstalk among different hormone signals –. We constructed a PPI sub-network related to hormone-signaling proteins and observed that TOR serves as the central hub for hormone crosstalk. In general, TOR controls cell growth by regulating basic processes such as protein translation, RNA transcription, and protein degradation ,. Hormone signals pass through relay proteins, ultimately affecting TOR activity. Although TOR has been well studied in yeast, human, and Arabidopsis, its function in hormone-signaling pathways has not been previously investigated. Although we identified reliable connections between TOR and hormone-signaling proteins, the underlying regulatory patterns have not yet been elucidated. Many of the predicted plant hormone interactions in CitrusNet are supported by experimental evidence in other species ,,. For example, PP2A can down-regulate the activity of TOR , whereas the relationship between PP2A and another interacting protein CTR1 has not been previously reported. Considering that ethylene negatively regulates plant growth, we speculate that CTR1 can down-regulate the activity of PP2A catalytic subunit. Previous studies showed that 14-3-3 protein plays important roles in brassinosteroid signaling pathways ; however, the biological significance of the interaction between 14-3-3 protein and BRI1 remains unclear. Furthermore, 14-3-3 protein exerts a positive effect on TOR ; therefore, there may be a regulatory relationship between 14-3-3 protein and BRI1. The relationships between TOR and other plant hormone-related proteins have not been reported previously. Our analysis of the genome-wide PPI network illustrated the central role of TOR in hormone-signaling pathways. In the future, CitrusNet should provide further insights into the functional genomics and systems biology of sweet orange.
In this analysis, we predicted the genome-wide PPI network of sweet orange using ortholog identification and domain-combination methods, and then employed a highly accurate KNN algorithm to filter the predicted interactions. The resultant PPI network contains 8,195 proteins and 124,491 interactions. We employed GO and Mapman annotation to assess the predicted network. We further predicted 159 protein complexes in sweet orange using orthologs of the yeast protein complexes and employed them to assess CitrusNet. We finally constructed a PPI sub-network related to hormone-signaling proteins, and found that TOR serves as the central hub for hormone crosstalk. CitrusNet provides a valuable resource for protein-protein interactions in sweet orange.
Complete genomic sequence, annotation, and transcription data of sweet orange were obtained from the sweet orange genome annotation project . RNA-seq data were obtained from four different tissues: flower, leaf, fruit, and callus. High-quality experimental PPI data from six reference model organisms, A. thaliana, S. cerevisiae, C. elegans, D. melanogaster, H. sapiens and Mus musculus, were collected from the public databases BIOGRID , IntAct , DIP , STRING , MINT , HPRD , and TAIR , To ensure the reliability of the data, we performed validation. Redundant sequences from different PPI databases were assigned to unique identification numbers using BLAST, with the following parameters: identity = 100% and coverage >99%. Final data are provided in Additional file 3.
Methods for predicting PPIs
In a genome-wide statistical trend, orthologues are the most similar genes in different species, and are generally assumed to retain equivalent functions . Therefore, if two orthologous proteins interact in one species, they are prone to interact in other species. We searched for orthologous proteins between sweet orange and each of the six reference organisms, using Inparanoid (standalone version 4.1) . In this analysis, A. thaliana had a higher score than the other five organisms. First, all potential orthologs between sweet orange and each of the six reference genomes were identified, and then potential orthologs in different organisms were grouped together. Inparanoid scores were normalized in the range of 0–1, with a higher maximum threshold (4600) to retain conserved genes. If a protein corresponded to multiple orthologs in one reference genome, the orthologous protein with the highest score was retained. For any two proteins in sweet orange, if their orthologs in the six reference genomes had at least one experimentally verified interaction, the two proteins were predicted to interact. To rank candidate protein interactions, a normalized score was defined:
In equation (1), InScore A and InScore B indicated the normalized Inparanoid score for an interacting protein pair (A and B) in one reference organism. The final score for a protein pair was the sum of the scores in the six reference genomes. The threshold was set to 0.15, according to the distribution pattern of the score.
Domains, the fundamental functional units of proteins, participate in intermolecular interactions between related proteins. If the domains of two proteins are confidently known to interact, then the proteins themselves can be predicted to interact ,,. Consequently, PPIs can be inferred from their domain-domain interactions (DDIs). Han and colleagues considered that PPIs might be the consequences of groups of domains or multiple domain interactions; therefore, we applied their method to obtain accurate PPI networks based on DDIs . We proposed the notion of domain combinations and domain-combination pair (dc-pair). To overcome the limitation of conventional domain-prediction methods, our model framework regarded dc-pairs as the basic units of PPIs. We set up the following probabilistic model to predict the interaction probability of protein pairs. Domains of sweet orange proteins were identified, using the Hmmer tool , to search against the Pfam database (including PfamA and PfamB) . The threshold of Hmmer was set to 1e-3, and the domain alignment coverage was set to >0.6 for multi-domain proteins and >0.9 for single-domain proteins.
Suppose a protein pair A and B contained m and n domains, respectively. Then, Protein A and B have 2m-1 and 2n-1 possible domain combinations, and (2m-1) × (2n-1) dc-pairs would be obtained. The score for each dc-pair was defined as follows:
Using equation (2), all the dc-pair scores in the six model organisms were calculated. Considering that a dc-pair might appear in many different protein pairs with different scores, the highest score was selected for each dc-pair. After obtaining the scores of all the dc-pairs, we calculated the scores of potential interacting protein pairs in sweet orange. The final score for each protein pair (A and B) was defined as follows.
In equation (3), rated-dc-pairs indicates the dc-pairs of protein pair A and B in the six reference organisms, and all-dc-pairs indicates all the dc-pairs of protein A and B. The final score in equation (3) denotes the protein pair (A and B) obtained from the domain-combination method. An appropriate threshold was set to obtain a credible result.
Method for filtering and assessing PPI data
Because no experimental PPI dataset is available for sweet orange, we employed third-party cross-validation methods to filter and assess the predicted PPIs based on ortholog identification and domain-combination. K-nearest neighbor (KNN), a simple and accurate method to validate predicted PPIs, requires positive and negative training datasets . In this analysis, the positive samples were experimentally determined PPIs in the six reference organisms, which were divided into five equal groups. The corresponding negative samples were randomly selected protein pairs from sweet orange, excluding the real or related protein pairs from positive samples (Pearson correlation >0.3). Each of the five groups had the same number of protein pairs in the positive and negative sample sets. The prediction results were checked by five independent training datasets, and predicted interactions supported by at least two datasets were kept. To compare the difference between PPI network and random network, this procedure was also performed for random networks.
Methods for processing RNA-seq data
RNA-seq reads were generated by the Illumina and SOLiD platforms and aligned to the reference genome using TopHat (v1.2.0) and BioScope (v1.3) with default parameters . Gene-expression levels for each RNA-seq library were calculated as reads per kilobase exons model per million mapped reads (RPKM) . For hierarchical clustering analysis, we used clustering software Cluster 3.0 to perform complete linkage hierarchical clustering on genes, using uncentered Pearson’s correlation as the distance measure. Supposing that gene A interacted with m genes, we defined the expression difference of interacting genes (EDI) as follows.
In equation (4), RPKM A and RPKM i indicate the RNA-seq expression values for gene A and its interacting partner i, and D A and D i indicate the degree of gene A and gene i. If gene A has EDI value close to zero, the expression levels of gene A and its interacting partner are similar.
After obtaining EDI, we calculated EDI variation values for each gene in four tissues; callus, leaf, flower, and fruit. The EDI variation value was defined in equation (5).
EDI highest and EDI lowest represent the highest and lowest EDI of a gene in the four tissues, respectively. If a gene was not expressed in one tissue, we defined EDI lowest as zero.
Expression difference of interacting genes
Target of rapamycin
Affinity purification mass spectrometry
Brown AP, Affleck V, Fawcett T, Slabas AR: Tandem affinity purification tagging of fatty acid biosynthetic enzymes in Synechocystis sp. PCC6803 and Arabidopsis thaliana. J Exp Bot. 2006, 57: 1563-1571.
Causier B, Davies B: Analysing protein-protein interactions with the yeast two-hybrid system. Plant Mol Biol. 2002, 50: 855-870.
Angenendt P, Kreutzberger J, Glokler J, Hoheisel JD: Generation of high density protein microarrays by cell-free in situ expression of unpurified PCR products. Mol Cell Proteomics. 2006, 5: 1658-1666.
Popescu SC, Popescu GV, Bachan S, Zhang Z, Seay M, Gerstein M, Snyder M, Dinesh-Kumar SP: Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays. Proc Natl Acad Sci U S A. 2007, 104: 4730-4735.
Ramachandran N, Hainsworth E, Bhullar B, Eisenstein S, Rosen B, Lau AY, Walter JC, LaBaer J: Self-assembling protein microarrays. Science. 2004, 305: 86-90.
Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, et al: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437: 1173-1178.
Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, et al: A protein interaction map of Drosophila melanogaster. Science. 2003, 302: 1727-1736.
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627.
Walhout AJM, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal M: Protein interaction mapping in C. elegans using proteins involved in vulval development. Science. 2000, 287: 116-122.
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999, 402: 86-90.
Han DS, Kim HS, Jang WH, Lee SD, Suh JK: PreSPI: a domain combination based prediction system for protein-protein interaction. Nucleic Acids Res. 2004, 32: 6312-6320.
Huang TW, Tien AC, Huang WS, Lee YC, Peng CL, Tseng HH, Kao CY, Huang CY: POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics. 2004, 20: 3273-3276.
Lee SA, Chan CH, Tsai CH, Lai JM, Wang FS, Kao CY, Huang CY: Ortholog-based protein-protein interaction prediction and its application to inter-species interactions. BMC Bioinformatics. 2008, 9 (Suppl 12): S11-
Tsoka S, Ouzounis CA: Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat Genet. 2000, 26: 141-142.
Valencia A, Pazos F: Computational methods for the prediction of protein interactions. Curr Opin Struct Biol. 2002, 12: 368-373.
Wojcik J, Schächter V: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics. 2001, 17: S296-S305.
Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30: 303-305.
Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G: MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 2012, 40: D857-D861.
Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012, 40: D841-D846.
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009, 37: D412-D416.
Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, Reguly T, Rust JM, Winter A, Dolinski K, Tyers M: The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 2011, 39: D698-D704.
Goel R, Harsha HC, Pandey A, Prasad TS: Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis. Mol Biosyst. 2012, 8: 453-463.
Dreze M, Carvunis AR, Charloteaux B, Galli M, Pevzner SJ, Tasan M, Braun P, Vidal M: Evidence for network evolution in an Arabidopsis interactome map. Science. 2011, 333: 601-607.
Brandao MM, Dantas LL, Silva-Filho MC: AtPIN: Arabidopsis thaliana protein interaction network. BMC Bioinformatics. 2009, 10: 454-
Geisler-Lee J, O'Toole N, Ammar R, Provart NJ, Millar AH, Geisler M: A predicted interactome for Arabidopsis. Plant Physiol. 2007, 145: 317-329.
Li P, Zang W, Li Y, Xu F, Wang J, Shi T: AtPID: the overall hierarchical functional protein interaction network interface and analytic platform for Arabidopsis. Nucleic Acids Res. 2011, 39: D1130-D1133.
Zhu P, Gu H, Jiao Y, Huang D, Chen M: Computational identification of protein-protein interactions in rice based on the predicted rice interactome network. Genomics Proteomics Bioinformatics. 2011, 9: 128-137.
Cheng Y, de Vicente MC, Meng H, Guo W, Tao N, Deng X: A set of primers for analyzing chloroplast DNA diversity in Citrus and related genera. Tree Physiol. 2005, 25: 661-672.
Pedrosa A, Schweizer D, Guerra M: Cytological heterozygosity and the hybrid origin of sweet orange [Citrus sinensis (L.) Osbeck]. Theor Appl Genet. 2000, 100: 361-367.
Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao WB, Hao BH, Lyon MP, Chen JJ, Gao S, Xing F, Lan H, Chang JW, Ge XH, Lei Y, Hu Q, Miao Y, Wang L, Xiao SX, Biswas MK, Zeng WF, Guo F, Cao HB, Yang XM, Xu XW, Cheng WJ, Xu J, Liu JH, et al: The draft genome of sweet orange (Citrus sinensis). Nat Genet. 2013, 45: 59-66.
Kohl M, Wiese S, Warscheid B: Cytoscape: software for visualization and analysis of biological networks. Methods Mol Biol. 2011, 696: 291-303.
Joy MP, Brock A, Ingber DE, Huang S: High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol. 2005, 2005: 96-103.
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261.
Thimm O, Blasing O, Gibon Y, Nagel A, Meyer S, Kruger P, Selbig J, Muller LA, Rhee SY, Stitt M: MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 2004, 37: 914-939.
Li Y, Agarwal P, Rajagopalan D: A global pathway crosstalk network. Bioinformatics. 2008, 24: 1442-1447.
Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002, 12: 37-46.
Segal E, Wang H, Koller D: Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics. 2003, 19 (Suppl 1): i264-i271.
Tornow S, Mewes HW: Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res. 2003, 31: 6283-6289.
Brohee S, van Helden J: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006, 7: 488-
Pu S, Wong J, Turner B, Cho E, Wodak SJ: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 2009, 37: 825-831.
Jiang Z, Liu X, Peng Z, Wan Y, Ji Y, He W, Wan W, Luo J, Guo H: AHD2.0: an update version of Arabidopsis Hormone Database for plant systematic studies. Nucleic Acids Res. 2011, 39: D1123-D1129.
Diaz-Troya S, Perez-Perez ME, Florencio FJ, Crespo JL: The role of TOR in autophagy regulation from yeast to plants and mammals. Autophagy. 2008, 4: 851-865.
Raught B, Gingras AC, Sonenberg N: The target of rapamycin (TOR) proteins. Proc Natl Acad Sci U S A. 2001, 98: 7037-7044.
Deprost D, Yao L, Sormani R, Moreau M, Leterreux G, Nicolai M, Bedu M, Robaglia C, Meyer C: The Arabidopsis TOR kinase links plant growth, yield, stress resistance and mRNA translation. EMBO Rep. 2007, 8: 864-870.
Kieber JJ, Rothenberg M, Roman G, Feldmann KA, Ecker JR: CTR1, a negative regulator of the ethylene response pathway in Arabidopsis, encodes a member of the raf family of protein kinases. Cell. 1993, 72: 427-441.
Santner A, Estelle M: Recent advances and emerging trends in plant hormone signalling. Nature. 2009, 459: 1071-1078.
Gazzarrini S, McCourt P: Cross-talk in plant hormone signalling: what Arabidopsis mutants are telling us. Ann Bot. 2003, 91: 605-612.
Ludwig AA, Romeis T, Jones JD: CDPK-mediated signalling pathways: specificity and cross-talk. J Exp Bot. 2004, 55: 181-188.
Spoel SH, Dong X: Making sense of hormone crosstalk during plant immune responses. Cell Host Microbe. 2008, 3: 348-351.
Dennis PB, Fumagalli S, Thomas G: Target of rapamycin (TOR): balancing the opposing forces of protein synthesis and degradation. Curr Opin Genet Dev. 1999, 9: 49-54.
Fingar DC, Blenis J: Target of rapamycin (TOR): an integrator of nutrient and growth factor signals and coordinator of cell growth and cell cycle progression. Oncogene. 2004, 23: 3151-3171.
Selvaraj A, Thomas G: Phosphatase 2A puts the brakes on mTORC1 nutrient signaling. Cell Metab. 2010, 11: 245-247.
Chang IF, Curran A, Woolsey R, Quilici D, Cushman JC, Mittler R, Harmon A, Harper JF: Proteomic profiling of tandem affinity purified 14-3-3 protein complexes in Arabidopsis thaliana. Proteomics. 2009, 9: 2967-2985.
Bertram PG, Zeng C, Thorson J, Shaw AS, Zheng XF: The 14-3-3 proteins positively regulate rapamycin-sensitive signaling. Curr Biol. 1998, 8: 1259-1267.
Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E: The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2012, 40: D1202-D1210.
Gabaldón T, Koonin EV: Functional and evolutionary implications of gene orthology. Nat Rev Genet. 2013, 14: 360-366.
Ostlund G, Schmitt T, Forslund K, Kostler T, Messina DN, Roopra S, Frings O, Sonnhammer EL: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010, 38: D196-D203.
Han D, Kim HS, Seo J, Jang W: A domain combination based probabilistic framework for protein-protein interaction prediction. Genome Inform. 2003, 14: 250-259.
Ta HX, Holm L: Evaluation of different domain-based methods in protein interaction prediction. Biochem Biophys Res Commun. 2009, 390: 357-362.
Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39: W29-W37.
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, 36: D281-D288.
Guarracino M, Nebbia A: Predicting protein-protein interactions with k-nearest neighbors classification algorithm. 2010.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012, 7: 562-578.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628.
The present study was supported by the National Natural Science Foundation of China (31271968 and 31271406), the Program for New Century Excellent Talents in University (NCET-13-0807) and the Fundamental Research Funds for the Central Universities (2013PY004, 2014PY018).
The authors declare that they have no competing interests.
DYD and CJW collected the data, performed the analysis and wrote the draft manuscript. CLL and CYJ supervised the project and wrote the manuscript. GJ and CDJ set up the web server and helped to collect the data. LS helped to prepare the figures. XQ and DXX provided sequencing materials and revised the manuscript. All authors carefully read and approved the final manuscript.
Electronic supplementary material
Additional file 1: The predicted 159 protein complexes in sweet orange, using best hits of sequence alignments from yeast complexes.(XLS 974 KB)
Additional file 2: Genes highly expressed genes in at least three of the four tissues; callus, flower, fruit, and leaf.(XLS 201 KB)
About this article
Cite this article
Ding, YD., Chang, JW., Guo, J. et al. Prediction and functional analysis of the sweet orange protein-protein interaction network. BMC Plant Biol 14, 213 (2014). https://doi.org/10.1186/s12870-014-0213-7
- Protein-protein interaction
- Plant hormone