Abstract
Transfer Learning (TL) is a design methodology within machine learning (ML) that aims to utilise knowledge gained while solving one problem to solve a different but related problem. This approach is implemented using ML models developed with similar and related data to transfer knowledge to a similar task. Although these datasets are typically taken from similar and related projects, their distributions can be different from one another. Previous works have failed to consider that negative knowledge, known as negative transfer (NT), can occur in all phases of a transfer learning system, and the NT problem is more evident in the presence of class imbalance. These instances of NT could be as a result of domain divergence, the transfer algorithm used, source, or data quality.
This thesis investigates, designs, and implements a TL strategy to improve models’ performance and prevent NT in the presence of class imbalance by utilising a comprehensive approach. Firstly, to design this strategy, the thesis begins by understanding the dimensions of TL and proposing a unified taxonomy framework that encompasses these dimensions. Secondly, the thesis presents two novel TL source selection methods to mitigate the risk on the target dataset. The first method is recommended when building a source model from scratch, while the other method is intended for selecting a source model from a range of established models.
Thirdly, the thesis introduces an information-theoretic measure to determine domain divergence when multiple sources are employed. Fourthly, the thesis employs a systematic approach that combines cost-weighting learning, uncertainty-guided loss functions, and importance sampling to prevent NT. Finally, the thesis conducts ablation studies to understand the effect of combining the method of optimal source model selection with methods that can improve learning and prevent NT in the presence of class imbalance.
The thesis reveals that when the variance error between the source domain and the target domain is reduced, NT is also reduced. Moreover, the use of an information-theoretic measure to evaluate and select the source model further decreases NT. The thesis finds that performance is improved when domain distributions are very similar. Additionally, by combining different sources of data with varying domain divergence, the thesis reveals that performance is greatly enhanced in some cases. The thesis observes that a robust loss function that takes into account the uncertainties around the target data and model can further reduce NT. Finally, the thesis concludes that NT can be avoided by
incorporating prior knowledge of the target and source data ratio into the learning process when simultaneously addressing distribution differences and class imbalance in a classification task.
By applying the proposed NT prevention methods and the new TL source model selection methods, the thesis shows that through the combination of a method that can implicitly prevent NT during the learning phase and an efficient source selection method, the performance of TL can be improved, and NT can be further prevented. Thus, the contribution of this thesis highlights the need for addressing NT in all phases of a TL system when such a strategy is used for classification tasks.