Abstract
Cancer is a superordinate term encompassing a large and varied category of diseases, in which cells undergo uncontrolled growth, and acquire invasive and aggressive characteristics over time. In all cancers, metastasis to distant organs results in a significant reduction in patient survival. Because of this, there is a great unmet need for the development of clinical biomarkers to predict risk of disease recurrence. While much previous exploration of therapeutic targets and biomarkers have been directed towards proteins and protein-coding genes, the majority of the transcribed genome is comprised of non-coding RNAs, in particular long non-coding RNAs (lncRNAs). LncRNAs are transcripts of >500 nucleotides in length that do not encode a protein. They are highly suited as biomarkers, due to their superior tissue- and cancer-specific expression compared to protein-coding genes.
In order to develop a pipeline for the discovery and characterisation of lncRNAs with potential clinical relevance, I studied their expression in triple negative breast cancer (TNBC) using single cell sequencing, and in colorectal cancer (CRC) using spatial transcriptomics. These methods are superior to bulk RNA sequencing as they better capture lncRNA expression signal, which is cell-type specific and can be lost if expression is only in a small population of cells (for example, the cells driving metastasis). Finally, I collaborated on the development of a toolkit for in silico characterisation of lncRNAs, targeted at non-computational biologists to make first-pass prioritisation of lncRNA candidates an accessible task. Together, this work comprises of a validated pipeline for the identification and characterisation of cancer-associated lncRNAs with potential clinical relevance as biomarkers.