As protein-protein interaction is intrinsic to most cellular processes, the ability
to predict which proteins in the cell interact can aid significantly in identifying the
function of newly discovered proteins, and in understanding the molecular networks
they participate in. Here we demonstrate that characteristic pairs of
sequence-signatures can be learned from a database of experimentally determined
interacting proteins, where one protein contains the one sequence-signature and its
interacting partner contains the other sequence-signature. The sequence-signatures
that recur in concert in various pairs of interacting proteins are termed correlated
sequence-signatures, and it is proposed that they can be used for predicting putative
pairs of interacting partners in the cell. We demonstrate the potential of this approach
on a comprehensive database of experimentally determined pairs of interacting
proteins in the yeast Saccharomyces cerevisiae. The proteins in this database have
been characterized by their sequence-signatures, as defined by the InterPro
classification. A statistical analysis performed on all possible combinations of
sequence-signature pairs has identified those pairs that are over-represented in the
database of yeast interacting proteins. It is demonstrated how the use of the correlated
sequence-signatures as identifiers of interacting proteins can reduce significantly the
search space, and enable directed experimental interaction screens.