Abstract
Protein-protein interactions (PPIs) are critical to all cellular activities. Despite having a large number of proteins, cells have spatial and temporal control over PPIs to avoid dysregulation in cellular pathways. Considerable research efforts have aimed to find new PPIs, curate PPIs from the literature and build searchable PPI databases. These databases have been widely used by experimental and computational scientists. Here we find that the PPIs captured by these databases are highly heterogeneous and concentrated on a small number of species. These issues hamper researchers from capturing the full landscape of reliable PPIs, affecting the accuracy of machine-learning models and the effectiveness of experimental designs. However, there are opportunities to fill gaps computationally and experimentally. We suggest developing a phylogenetically informed approach to test PPIs experimentally and computationally.