Logo image
Data Selection Strategies for Bayesian Analysis with Filtering of Genetic Data
Graduate Thesis/Dissertation   Open access

Data Selection Strategies for Bayesian Analysis with Filtering of Genetic Data

Joshua James Bromell
Master of Science - MSc, University of Otago
University of Otago
2021
Handle:
https://hdl.handle.net/10523/10642

Abstract

Genotyping by Sequencing GBS SNP Optimisation Monte Carlo
The aim of this thesis is to look into data selection strategies for selecting data to be used for Bayesian analysis of genotyping by sequencing (GBS) data. Each selection of data leads to a different distribution on the model parameters. Methods for analysing the different resulting posterior distributions will be discussed and compared. The most applicable method will be applied to a set of simulated genetic markers. Traditionally, GBS data sets are constructed so that each marker is a polymorphic (non-constant) site, for example a single nucleotide polymorphism (SNP). However, there is evidence to show that this might not be the optimal method. The best method may in fact be to include a certain proportion of sites which are not filtered on being polymorphic sites and are allowed to be constant sites. To understand whether there is truth in this, we begin by analysing simplified problems with simpler distributions. These simpler problems will be studied analytically and using Monte Carlo samples. This decision making process is to decide the optimal proportion of which class of data points to include in the marker data set. The chosen method will then be first applied to a simulated marker data set and then the results analysed in order to show there appears to be an optimal mixture of data which should be used in any future phylogenetic analysis.
pdf
Josh_Bromell_Thesis.pdfDownloadView

Metrics

116 File views/ downloads
284 Record Views

Details

Logo image