Predicting protein functions from redundancies in large-scale protein interaction networks
- National Aeronautics and Space Administration Advanced Supercomputing Division, National Aeronautics and Space Administration Ames Research Center, Moffet Field, CA 94035
-
Edited by Peter G. Wolynes, University of California at San Diego, La Jolla, CA, and approved August 14, 2003 (received for review April 28, 2003)
Abstract
Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (≈89%) of the original associations.
Footnotes
-
↵ * To whom correspondence should be addressed. E-mail: Shoudan.Liang{at}nasa.gov.
-
This paper was submitted directly (Track II) to the PNAS office.
-
↵ † Since the data set contains N = 4,692 proteins, 1/N 2 ≈ 10–8 is a reasonable cutoff. The number is validated by more rigorous comparison with the random network shown in Fig. 2. However, this is not a sharp threshold as we discuss in more detail. Therefore, we present pairs up to 2 ×10–4 on our web site at www.nas.nasa.gov/bio.
-
↵ ‡ Here, we use the functional classes and annotations provided in ref. 9. The actual number of unannotated proteins at present may be lower than this source.
- Copyright © 2003, The National Academy of Sciences





