Exploring the protein universe: a study of subdomain driven evolution

Gabriel Rawcliffe

Even a relatively short polypeptide of 75 amino acids has more unique sequence possibilities than the number of atoms in the observable universe. Given the vastness of this protein sequence space, the fraction of this space that has ever been sampled by nature is tiny, and the fraction still observable in nature is even smaller. There are (currently) fewer than 150,000,000 non-redundant protein sequences in RefSeq, which represent only ~2000 different protein folds. This work aimed to understand the evolutionary mechanisms that gave rise to these folds and to determine the frequency at which this repertoire of folds can be expanded. It has been proposed that the limited protein repertoire is a consequence of ancient proteins being built from a distinct set of building blocks, called subdomains. Representing some of the earliest peptides, subdomains would have contributed a basic motif of structure or function that would have acted as a scaffold around which an autonomously-folding domain could be built. These primordial subdomains could have associated with one another giving rise to proteins with new structure and function, which were eventually encoded on full length genes; a mechanism referred to as subdomain assembly. This research tried to recapitulate subdomain assembly by recombining subdomain sized fragments of modern proteins. The method Incremental Truncation for the Creation of Hybrid enzYmes (ITCHY) was used to recombine randomly sized fragments from every Escherichia coli open reading frame. An ITCHY library of ~10,000,000 chimeric genes was constructed and screened for soluble and functional proteins. The library potentially contained combinations of any sized fragment of any two E. coli ORFs. A plasmid-based solubility selection (pSALect) was used to search the library for folded proteins. This method did not identify any hybrid proteins which were able to be solubly over-expressed and purified. Auxotroph rescue experiments were performed to select for functional hybrids. The ITCHY library was used to transform 107 conditionally auxotrophic strains of E. coli each with a single gene knockout. Six chimeric proteins were found which could rescue different knockout strains. One of these proteins – a hybrid of MioC and SgbE, which rescued the strain E. coli ∆cysN – was able to be expressed and purified. Through biophysical characterisation the hybrid SgbE147/MioC82 was found to have distinct secondary structural elements, as well as apparently forming a dimer in solution. The secondary structure of SgbE147/MioC82 appears to have high thermal stability, while the protein as a whole may have areas of disorder. The sequence of the hybrid indicates that one of the parental fragments is a β-⍺-β motif, one of the most ancient subdomains. This work represents an advancement of previously established techniques as it describes the most comprehensive library of its type produced to date. The hybrid SgbE147/MioC82 is one of the first functional proteins produced de novo from a random recombination approach that does not utilise rational design in any way. The generation of a novel protein which is both soluble and functional provides evidence that subdomain assembly was a viable route to the earliest folded domains. It also suggests that modern genomes retain the potential to generate novel genes, which encode proteins that are soluble and functional, via non- homologous recombination.

Exploring the protein universe: a study of subdomain driven evolution

Abstract

Files and links (1)

Metrics

Details