Logo image
Variant analysis and assembly of the KIV-2 repeat region in LPA
Graduate Thesis/Dissertation   Open access

Variant analysis and assembly of the KIV-2 repeat region in LPA

Ruth Platt Platt
Master of Science - MSc, University of Otago
University of Otago
2022
Handle:
https://hdl.handle.net/10523/13774

Abstract

Lipoprotein(a) LPA Cardiovascular disease MinION KIV2
Cardiovascular disease (CVD) is the leading cause of mortality around the world, and elevated lipoprotein(a) [Lp(a)] has been strongly associated with increased CVD risk. Lp(a) is a lipoprotein made up of an LDL-like particle with a large glycoprotein, apolipoprotein(a) [apo(a)], attached. Lp(a) levels are under strong genetic control, mainly determined by variation in the gene encoding apo(a), LPA. LPA arose from multiple duplications of the KIV domain of the plasminogen gene, resulting in KIV types 1-10. This KIV region of LPA is upstream of a KV domain and an inactive protease domain also duplicated from the plasminogen gene. Kringle IV type two (KIV-2) has undergone further replication resulting in a high copy number variation (CNV) in this location with individuals carrying between 5 and > 50 repeats. The KIV-2 CNV accounts for 20-70% of the variation seen in Lp(a) levels, due to an inverse relationship between KIV-2 CNV and Lp(a) concentration. As the CNV does not account for all variation seen in Lp(a) levels, this suggests that there are many variants within the LPA gene influencing Lp(a) levels. While many variants in LPA have been located and investigated, the repetitive nature of the KIV-2 region makes it difficult to sequence and assemble. During this study, next generation sequencing (NGS) methods were first used to sequence the LPA gene of 13 null individuals (individuals with no Lp(a) protein present in blood plasma) and one sample with high Lp(a). From this, a total of 68 SNPs were found within LPA. Of these, 44 resided within the non-repeat region and 24 within the KIV-2 repeat region. Those in the non-repeat region had already been identified, and some had been functionally characterised. However, some of the SNP identified within the KIV-2 region were novel, and none of these had been functionally investigated. To further understand how these KIV-2 variants were affecting apo(a) structure and function, overall kringle sequence conservation was investigated and PyMOL modelling of KIV-2 variants was performed. Analysis indicated that there were four KIV-2 SNPs in highly conserved residues within LPA and PLG kringles. Additionally, it was determined that variation in these four residues resulted in the loss of interaction with at least one other residue within the kringle structure. To further investigate the location of the KIV-2 CNV variants and how many times they were present, the null sample 558 with a high variant load was selected for nanopore adaptive sampling. Nanopore sequencing, which generates extremely long reads, was used along with the adaptive sampling program to selectively sequence and enrich the KIV-2 region of interest within sample 558. Using these tools and the canu assembly software, KIV-2 repeat region assembly was attempted. This resulted in a 43,677 bp contig, which included seven repeats of KIV-2, as well as one of the six variants identified in the KIV-2 region of the sample during NGS. Further inspection of nanopore reads identified three more variants, which had been excluded from the final assembly. This means that greater coverage of the region was required and that the assembly process must be optimised further. The adaptive sampling methods utilised in this thesis display great potential for analysis of repeat regions that have been challenging to assemble. Optimisation of assembly will enable us to determine how many times variants are present and where they reside within the KIV-2 region. In the context of future functional studies, the protein modelling results presented in this thesis have identified variants that warrant future functional studies to establish their effect on protein production and function and their connection with the null phenotype.
pdf
PlattRuth2020MSc.pdfDownloadView

Metrics

174 File views/ downloads
183 Record Views

Details

Logo image