Abstract
Bacteria may adapt their gene expression patterns to survive in diverse environments. This
adaptation can be achieved by differentially regulating sets of genes through the activity of
transcriptional activators or repressors. A common process is negative auto-regulation,
where a Transcription Factor (TF) represses its own gene expression. This feature is
observed in about 40% of known TFs in E. coli. Although the helix-turn-helix (HTH) is the
most common domain in DNA-binding TFs in bacteria, there is still a gap in tools for
employing genomic data and computational biology techniques to decipher their regulatory
characteristics.
In this study, I conducted a genomic analysis of known HARPs (HTH Auto-Repressor
Proteins) in E. coli. I utilized a Random Forest (RF) classification algorithm to predict
HARPs based on their genomic attributes and structural protein features with an accuracy
exceeding 83%. Among the 132 uncharacterized HTH TFs of E. coli, the program
(PromAnalyzer) identified 65 TFs as HARPs using an RF prediction score >0.5.
Subsequently, I extended the prediction methodology to a phylogenetics-based approach to
predict new Transcription Factor Binding Sites (TFBSs) for the predicted HARPs with
characterized orthologous genes in Enterobacteria. Testing the program on a set of known
TFs in this taxonomic group yielded an accuracy of over 92%, with more than a 90% overlap.
This analysis identified TFBSs for 25 of the predicted HARPs. Notably, the program also
suggested TFBSs for known HARPs such as LysR and AlsR.
To test these by experiment I selected 10 predicted HARPs identified by PromAnalyzer to
assess their auto-repression function using a fluorescence reporter system. This validation
was carried out using a double plasmid system. Differential RFP expression was used to
examine the expression level in experimental cell strains expressing the TF, as compared to
control cell strains with no TF. As a result, six HTH TFs (HdfR, GntR, HypT, NsrR, DecR,
UlaR) exhibited statistically significant auto-repression under the specified experimental
conditions within 8 hours. Additionally, KdgR displayed significant auto-repression, but
over a longer time frame.
Given the prevalence of auto-regulation across various TF families, these findings
underscore the potential of the methodology applied in PromAnalyzer to uncover previously
uncharacterized TF-TFBS interactions within E. coli.