Abstract
Recombinant protein production is a cornerstone of modern biotechnology and has been utilised to produce many proteins of scientific and commercial interest. The optimality of result is dependent on the balances among the involved intricate stochastic processes. In particular, two of the critical processes are protein expression and solubility. Collectively, the failures at these two steps drop down the success rate of protein production to around 25%. Furthermore, toxicity of recombinant proteins may also significantly reduce the amount of protein produced. Therefore, prediction and optimisation of expression, solubility and an early detection of these toxic proteins could save resources and assist in better planning of the experiment.
In this work, we show that mRNA accessibility, measured through the opening energy, and protein structural flexibility, measured by using the normalised B-factors, can describe protein expression and solubility respectively with a higher accuracy than other features. We also develop a new and more accurate protein solubility predicting metric called the Solubility-Weighted Index (SWI). Using these findings, we develop a gene expression prediction and optimisation tool: Translation Initiation coding region designer (TIsigner), available at https://tisigner.com/tisigner and protein solubility prediction and optimisation tool: Soluble Domain of Protein Expression (SoDoPE), available at https://tisigner.com/sodope. We also developed a third tool, Razor https://tisigner.com/razor, for the detection of toxins. To assist in maximising protein production, we also develop a pipeline for optimising protein expression, solubility and toxin detection by integrating these three tools.