3116

DOI 10.1002/pmic.200401138

Proteomics 2005, 5, 3116–3119

SHORT COMMUNICATION

Cooperative evolution in protein complexes of yeast from comparative analyses of its interaction network Massimo Vergassola1, Alessandro Vespignani2 and Bernard Dujon3 1

Unité Génomique des Microorganismes Pathogènes, Structure and Dynamics of Genomes Department, Institute Pasteur, CNRS URA 2171, Paris Cedex, France 2 Laboratoire Physique Théorique, CNRS UMR 8627, Université Paris-Sud, Orsay, France 3 Unité Génétique Moléculaire des Levures (UFR 927 Université Pierre et Marie Curie and URA 2171 CNRS), Structure and Dynamics of Genomes Department, Institute Pasteur, Paris Cedex, France

A comparative analysis among Saccharomyces cerevisiae and the other four yeasts Candida glabrata, Kluyveromyces lactis, Debaryomyces hansenii, and Yarrowia lipolytica is presented. The broad evolutionary range spanned by the organisms allows to quantitatively demonstrate novel evolutionary effects in protein complexes. The evolution rates within cliques of interlinked proteins are found to bear strong multipoint correlations, witnessing a cooperative coevolution of complex subunits. The coevolution is found to be largely independent of the tendency of the subunits to have similar abundances.

Received: September 15, 2004 Revised: November 9, 2004 Accepted: November 12, 2004

Keywords: Comparative analyses / Evolution / Protein-protein interaction networks / Saccharomyces cerevisiae

Major cellular processes involve protein complexes composed of several interacting subunits [1]. Large-scale information on the assembly of complexes is extracted from global networks of proteomic interactions. For Saccharomyces cerevisiae, genome-wide data have been gathered by two-hybrid methods [2, 3] and MS [4, 5]. Interaction data are synthetically structured as undirected networks: nodes represent proteins and edges connect pairs of interacting proteins. The connectivity of a node is defined as the number of connections to other nodes. Qualitative properties shared by all interaction data sets are the nontrivial topological structure of the networks and a broad connectivity distribution (see, e.g. [6]), bespeaking the statistical abundance of “hubs”, that is, nodes with a large connectivity. This entails a complex network

Correspondence: Professor Massimo Vergassola, Department of Structure and Dynamics of Genomes, CNRS URA 2171, Unité Génomique des Microorganismes Pathogènes, Inst. Pasteur, 28 rue du Dr. Roux, F-757724 Paris Cedex 15, France. E-mail: [email protected] Fax: 133-1-45688786 Abbreviation: BDBH, bi-directional best hit

© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

architecture, far from the random paradigm which has dominated past effort in network theory, and spurs the issue discussed here of the possible functional and evolutive role of the network’s topology. The first analysis in this direction was performed in [6], where correlation signatures between gene knock-out lethality and connectivity of the encoded protein were found. Subsequent works have considered the number of proteins without an ortholog in other species [7] and protein evolution rates [8]. These studies provide interesting findings relating to evolution, biological function, and the network topology. For instance, a negative correlation between the evolution rate of a protein and its connectivity, as well as coevolution signatures for pairs of interacting proteins, is found in [8, 9]. Furthermore, groups of interacting proteins appear to have similar phylogenetic footprintings [10] in high eukaryotes [7], suggesting common functional constraints in protein complexes. Caveats have been raised on previous studies, though. The aforementioned correlation was shown to be absent in data sets made of two-hybrid interactions only [11, 12], opening an issue about possible biases in interaction networks obtained with different techniques. A more general concern is that comparative analyses were performed with respect to distant relatives of S. cerevisiae. www.proteomics-journal.de

Systems Biology

Proteomics 2005, 5, 3116–3119

An exciting opportunity for comparative analyses is provided by the four hemiascomycetes, Candida glabrata, Kluyveromyces lactis, Debaryomyces hansenii, and Yarrowia lipolytica, recently sequenced in [13]. The fact that they share many functional similarities with S. cerevisiae and yet span a broad range of evolutive distances, comparable to the entire phylum of Chordates, make them ideal for protein comparisons (phylogenetic relationships are sketched in the supplementary material). In the ensuing analysis, we compare the S. cerevisiae proteins to the other four hemiascomycetes, inferring putative orthologs from the primary sequence and keeping only bidirectional best hits (BDBH) to reduce the effect of the high number of paralogs in the yeast genomes. Details of the sequence comparisons are reported in below. It was remarked in the literature that the interaction maps of the two-hybrid and the TAP-tag data sets overlap scantly (see [14]) and biases might be induced [11, 12]. We have, therefore, considered three different data sets: (i) the twohybrid data in [3, 4], (ii) the TAP-tag data of [5], and (iii) a large collection of both types of interactions, assembled at http:// dip.doe-mbi.ucla.edu Results presented hereafter refer to (iii), but they have all been crosschecked on the other data sets, as reported in the supplementary material. The simplest observation to consider is the repartition of S. cerevisiae proteins “lost,” i.e., lacking a putative ortholog in the other yeast of comparison. Lower “loss” rates were reported in [13] for proteins belonging to the interaction network with respect to those which do not. Here, we consider the dependence on the connectivity and Table 1 reports loss rates for proteins in the low connectivity (1–6) and high connectivity (7–282) groups, with the separation value chosen at the average connectivity (= 6.3). Table 1 also reports the probability that the observed dissimilarities in the rates are due to chance. Proteins in the low connectivity group are clearly lost at higher rates. A plethora of different biological factors might affect the fate of a protein. However, it seems sensible that, as some of the functions disappear in the speciation divergence process, proteins with low connectivity are statistically more likely to be lost without major harm for the organism. This is also confirmed by the similarity of the phylogenetic profiles [10] for interacting proteins, presented in the supplementary material. Let us now consider the novel observations aimed at evidencing cooperative coevolution effects in cliques (fully connected subgraphs) of interacting proteins. For each one of the yeasts of comparison, we constructed the corresponding list of S. cerevisiae conserved proteins, i.e., those appearing both in the interaction network and BDBH. A convenient indicator of their evolutive divergence rates is yielded by the evolutive ranks r. These are constructed by sorting the lists of S. cerevisiae conserved proteins in increasing order in the e-value of their best BLAST alignment to the yeast of comparison. Most conserved proteins are thus ranked first. A natural statistic condensing information on multipoint correlations among the evolutive rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P ranks within a clique of order n is D = ðri  rj Þ2 . Here, ri is i5j

© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

3117

the evolutive rank of the i-th protein and the sum runs over the n(n – 1)/2 couples in the clique –the normalization does not affect the results in the sequel and is discarded for simplicity. Figure 1 presents the behavior of the evolutionary distance D for cliques of up to five proteins. The curves report the difference between the histograms in the real network and a randomized one, obtained by replacing a protein in each clique with one chosen at random. The accumulation of probability for small differences in the set of interacting proteins is clearly visible. Its statistical significance is quantified by the Wilcoxon test [15], a nonparametric test of the null hypothesis that two series of data are generated from the same distribution, reported in Table 2. Higher, but still significant, figures are found for the two-hybrid and TAP-tag data sets (i) and (ii). The corresponding curves are reported in the supplementary material, together with the curves for the individual histograms. The randomization procedure entails the replacement of a single protein among n within a clique. Its effect is thus expected to reduce as the order n increases. Furthermore, the number of proteins belonging to at least one clique of order n reduces with the order (see Table 2). The pool of proteins whence we extract the “random” replacement is then shrinking as the order n increases. Both effects penalize cliques of higher order, showing that the significance of the dissimilarities to the randomized curves should be smaller as n increases. The opposite behavior found in several cases witnesses the importance of multipoint correlations among the divergence rates of proteins within the cliques. The effect is most visible in C. glabrata and K. lactis, the closest relative to S. cerevisiae on the phylogenetic tree, emphasizing the importance of the choice of the organisms for the comparative analysis. Table 1. Repartition of proteins lost, i.e., not appearing in the list of BDBH, for the comparative analysis between S. cerevisiae and the yeasts indicated in the first rows. Range of the protein connectivities in the two groups is indicated in the second row. Values in the table refer to the data set III, a collection of two-hybrid and TAP-tag data, whose average connectivity is 6.3. Rows from the third to the fifth report the total number of proteins, the number of those which are lost and the corresponding fraction for the two groups. In the last line it is reported the probability that the difference between the loss rates in the two groups be due to chance

C. glabrata Connectivity No. of genes No. lost Fraction Probability Connectivity No. of genes No. lost Fraction Probability

1?6 7 ? 282 3477 1236 1000 139 29% 11% 2.4 6 10231 D. hansenii 1?6 7 ? 282 3477 1236 1547 278 45% 23% 9.9 6 10230

K. lactis 1?6 7 ? 282 3477 1236 1089 155 31% 13% 8.7 6 10233 Y. lipolytica 1?6 7 ? 282 3477 1236 1814 375 52% 30% 3.2 6 10224

www.proteomics-journal.de

3118

M. Vergassola et al.

Proteomics 2005, 5, 3116–3119

Figure 1. Four plots report the difference between the histograms for cliques identified within the S. cerevisiae protein interaction network and a randomized version thereof. Observation D, defined in the text, conveys information on the strength of multipoint correlations in the evolution rates of proteins within a clique: low values of D are the signature of strong coevolutive patterns. Curves refer to the comparison between S. cerevisiae and K. lactis. Note the accumulation of events having small evolutive distances D for the protein interaction network.

Table 2. Wilcoxon test for the dissimilarities observed in the normalized histograms corresponding to those in Fig. 1. Second and third rows report the total number of cliques and proteins appearing therein. Their depletion with the order accounts for the reduction in the magnitude of the y-axis visible in Fig. 1. In the last line of the Table, it is reported the probability that the observed difference be due to chance. Its error bars, estimated from the SD measured in 10 000 realizations of the random draws, are reported on the corresponding z-scores, i.e., the deviations to the mean normalized by the SD, for the null Gaussian distribution in the Wilcoxon test

C. glabrata No. of proteins interlinked No. of groups No. of proteins z-score Probability 2.8 6 10224 No. of proteins interlinked No. of groups No. of proteins z-score Probability

2

3

4

11 006 3420 8.1 6 0.3 2.8 6 10216

5861 1451 12.8 6 0.2 8.2 6 10238

2 8290 2662 8.0 6 0.3 6.2 6 10216

K. lactis 2

3

4

5

10 652 3316 7.8 6 0.3 3.1 6 10215

5713 1420 12.6 6 0.2 1.1 6 10236

2

3

4337 711 12.3 6 0.1 4.5 6 10235 Y. lipolytica 4

3045 413 10.1 6 0.1

3

4526 3202 716 434 12.1 6 0.1 10.5 6 0.1 5.3 6 10234 4.3 6 10226 D. hansenii 4 5

4656 1156 8.6 6 0.1 4.0 6 10218

3467 612 7.4 6 0.1 6.8 6 10214

6642 2248 6.1 6 0.3 5.3 6 10210

3592 970 8.5 6 0.1 9.5 6 10218

2648 514 6.2 6 0.1 2.8 6 10210

1875 289 3.8 6 0.1 7.2 6 1025

We conclude by highlighting the relation of the coevolutive correlations to the concentration of proteins. It was indeed observed in [16] that interacting proteins tend to have similar abundances. Furthermore, a negative correlation between the protein abundance and the evolutive divergence rates of the corresponding genes with respect to C. albicans was reported in [17]. Similar correlations for the four yeasts considered here, together with the consequences for the behavior of the evolutionary rate versus the connectivity of the proteins, are presented in the supplementary material (see © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

5

2380 363 6.0 6 0.1 9.9 6 10210

5

also [11, 12]). Here, we concentrate on the relation to the previously established coevolutive correlations. Indeed, the correlations might a priori merely reflect the similar abundances of subunits of complexes and the relation between evolutive rates and abundances rather than genuine compensatory mutations. Note that, while specific examples of compensatory mutations are known (see, e.g., [18, 19]) and pair coevolutive correlations were presented in [8], no quantitative assessment of their global relation to the abundance of proteins was previously made. To that aim, we considered www.proteomics-journal.de

Systems Biology

Proteomics 2005, 5, 3116–3119

3119

The repartition of the lost proteins is analyzed as follows. The N proteins composing the protein network are divided in two groups: N1, having connectivity kk, and the remaining ones with connectivity larger than the average value over the whole network. The null hypothesis is that the L proteins lost in the whole network are drawn from the two groups at the same rate and the corresponding distribution is a binomial law for L draws and a probability p = N1 / N for the first group. This allows to easily calculate the probability that a deviation larger or equal to the one observed could be due to chance.

References Figure 2. Scatter plot of the abundance rank versus the evolutive rank differences for C. glabrata. Straight line is the linear regression, shown to stress the weakness of the correlation between the two quantities. Sizeable scatter of the data in the plane illustrates the lack of a functional dependence between the two quantities.

[1] Alberts, B., Cell 1998, 92, 291–294. [2] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A. et al., Nature 2000, 403, 623–627. [3] Ito, T., Chiba, T., Ozawa, R., Yoshida, M. et al., Proc. Natl. Acad. Sci. USA 2001, 98, 4569–4574. [4] Gavin, A., Bosche, M., Krause, R., Grandi, P. et al., Nature 2002, 415, 141–147.

pairs of interacting proteins having both a putative ortholog and their abundance available, and measured the differences of their abundance and evolutive ranks, and crosscorrelated them. The resulting Pearson r2 values are significant but 1022, meaning that the predictive value of the correlation is below 1%. That seems to be a genuine weakness of dependence, as confirmed by Wilcoxon tests and the scatter-plot in Fig. 2. Similar conclusions hold for multipoint correlations. In summary, we provided evidence for cooperative coevolution within cliques of interacting proteins of S. cerevisiae. The effect was shown to be largely independent of the tendency of interacting proteins to have similar concentrations. These findings suggest that cooperative compensatory mutations are a globally relevant mechanism to preserve the specificity in the assembly of complexes throughout the evolutionary divergence processes. Multipoint patterns of compensatory mutations might be practically important for the in silico analysis of protein interactions [20]. Cliques of interacting proteins are simple instances of motifs (see, e.g., [9, 21, 22]), suggesting that the multipoint coevolutive correlations found here might be a general feature of the modular architecture of biological networks. BLAST searches were performed using BLASTP 2.2.6 [23] with the BLOSUM 62 matrix and affine gap penalties of 11 (gap) and 1 (extension). Tables of BDBH were constructed by identifying the couples of proteins in the two organisms of comparison which are the reciprocal best alignments. The significance of the alignments was quantified by the BLAST e-values and two thresholds were considered (1021, 10210). Statistics presented here correspond to the latter, but the same conclusions hold for the former. The number of proteins appearing in the BDBH lists are 4327, 4227, 3521, and 3070 for C. glabrata, K. lactis, D. hansenii, and Y. lipolytica, respectively. © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

[5] Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D. et al., Nature 2002, 415, 180–183. [6] Jeong, H., Mason, S. P., Barabasi, A. L., Oltvai, Z. N., Nature 2001, 411, 41–42. [7] Wuchty, S., Oltvai, Z. N., Barabasi, A. L., Nat. Genet. 2003, 35, 176–179. [8] Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. et al., Science 2002, 296, 750–752. [9] Fraser, H. B., Wall, D. P., Hirsh, A. E., BMC Evol. Biol. 2003, 3, 11. [10] Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. et al., Proc. Natl. Acad. Sci. USA 1996, 96, 4285–4288. [11] Jordan, I. K., Wolf, Y. I., Koonin, E. V., BMC Evol. Biol. 2003, 3, 1. [12] Bloom, J. D., Adami, C., BMC Evol. Biol. 2003, 3, 21. [13] Dujon, B., Sherman, D., Fischer, G., Durrens, P. et al., Nature, 2004, 430, 35–44. [14] Von Mering, C., Krause, R., Snel, B., Cornell, M. et al., Nature 2002, 417, 399–403. [15] Wilcoxon, F., Biometrics 1945, 1, 80–83. [16] Ghaemmaghami, S., Huh, W. K., Bower, K., Howson, R. W. et al., Nature 2003, 425, 737–741. [17] Pál, C., Papp, B., Hurst, L. D., Genetics 2001, 158, 927–931. [18] Goh, C.-S., Bogan, A. A., Joachimiak, M., Walther, D. et al., J. Mol. Biol. 2000, 299, 283–293. [19] Pazos, F., Valencia, A., Proteins 2002, 47, 219–227. [20] Valencia, A., Pazos, F., Curr. Opin. Struct. Biol. 2002, 12, 368– 373. [21] Hartwell, L. H., Hopfield, J. J., Leibler, S., Murray, A. W., Nature 1999, 402, C47–C51. [22] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N. et al., Science 2002, 298, 824–827. [23] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J. et al., Nucleic Acid Res. 1997, 25, 3389–3402.

www.proteomics-journal.de

Cooperative evolution in protein complexes of yeast ...

from comparative analyses of its interaction network. Massimo Vergassola1 ... bespeaking the statistical abundance of “hubs”, that is, nodes with a large ...

200KB Sizes 1 Downloads 297 Views

Recommend Documents

Rule-mining discovers protein complexes in a large ...
We use the Apriori algorithm to discover the most prominent sets of genes that ... random and is statistically similar to high support motifs that our algorithm finds.

Rule-mining discovers protein complexes in a large ...
May 16, 2008 - We study a genetic interaction network of 1637 unique genes discovered ... our methodology by showing that the support of various protein.

ATP-Independent Cooperative Binding of Yeast Isw1a ...
Feb 16, 2012 - This is an open-access article distributed under the terms of the Creative Commons ... unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. .... nucleosomal substrates us

ATP-Independent Cooperative Binding of Yeast Isw1a ...
Feb 16, 2012 - thymus and purified from free 146 bps fragments as described previously [30]. ..... either in the DNA linker or the protein domain between the. HAND and ... ATP (upper trace) or with 100 mM ATP (lower trace). No significant ...

Cooperative Co-evolution of Robot Control, Sensor ...
Naturally evolved systems are not built of separately designed parts bolted together at a later date but symbiotically, each component is reliant on other components and evolves in such a way as to work even more closely with the whole. The result is

Rc.646 Re-organisation of School Complexes in Telangana State.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Rc.646 Re-or ... na State.pdf. Rc.646 Re-org ... ana State.pdf. Open.

METAL COMPLEXES OF (OXYGEN-NITROGEN-SULFUR) SCHIFF ...
Try one of the apps below to open or edit this item. METAL COMPLEXES OF (OXYGEN-NITROGEN-SULF ... ATION, CHARACTERIZATION, FLUORESCENT.pdf.

Collapsing Rips complexes
collection of images can be thought of as a point cloud in Rm×m. Assuming the .... first stage iteratively collapses vertices and the sec- ond stage iteratively ...

Inorganic lithium amine complexes
Jan 4, 1974 - complexing agent and thereafter recovering the desired ..... complex in benzene, the cheap hexah'ydrophthalic ..... prepared; the data for all of these complexes are shown 55 The results of this example are summarized in ...

MONOIDAL INFINITY CATEGORY OF COMPLEXES ...
f∗ : ♢qcoh(♢) → ♢qcoh(S) to be the natural projection limS→X ♢qcoh(S) → ♢qcoh(S). Since ♢qcoh(S) is a presentable ∞-category for any affine scheme S, a standard car- dinality estimation shows that ♢qcoh(♢) is presentable whe

Conformational diversity and protein evolution – a 60 ...
binding kinetics provided the first data indicating pre- existing isomers in .... duplication and mutation could then provide the increased genetic diversity to drive ...

Lipid metabolism and yeast aging_Frontiers in Bioscience.pdf ...
phosphate and PHS-1-phosphate (respectively), can be then converted into such non-sphingolipid molecules as ethanolamine- phosphate and aliphatic ...

Inorganic lithium amine complexes
Jan 4, 1974 - thium and the diamine form a homogeneous solution ..... drocarbon solution of the chelating complexing agent ..... One impurity was iden.

The Total Intracellular Concentration of Solutes in Yeast ...
the relative amount of cell water per kilogram of tissue will vary with varying extracellular water, and it is the litre of cell water that is suitably taken for comparison. From the slopes of the lines in Fig. 1 the values of. V6,,/C have been deter

The Total Intracellular Concentration of Solutes in Yeast and Other ...
From the data obtained the volume of the intracellular ... the data of Conway & Downey (1950) the total ..... as determined from the sum of the analytical data.

Capacity of Cooperative Fusion in the Presence of ...
Karlof and Wagner. [11] consider routing security in wireless sensor networks. ...... thesis, Massachusetts Institute of Technology, Cambridge, MA, August. 1988.

speciation in ancient cryptic species complexes ...
their diversification, and data on their biogeography and de- gree of sympatry. .... maintained in several laboratories and aquaculture centers ...... 65:347–365.

Evolution of Voting in the US - Home
1975 - Further amendments to the. Voting Rights Act require that many voting materials ... 1947 - Miguel Trujillo, a Native. American and former Marine, wins a.