Forcing Neural Networks to Behave Using Interpretability Methods

Craig Thomas

Back

Forcing Neural Networks to Behave Using Interpretability Methods

Graduate Thesis/Dissertation

Open access

Forcing Neural Networks to Behave Using Interpretability Methods

Craig Thomas

Master of Science - MSc, University of Otago

University of Otago

2022

Handle:

https://hdl.handle.net/10523/12669

Abstract

Neural Networks

Interpretability

Artificial neural networks are excellent machine learning models but are often referred to as “black boxes”, meaning that the reasoning behind their decisions is obscured. The field of neural network interpretability attempts to explain why these models make the decisions they do. In my research I combine methods for interpreting neural network decisions with the neural network training process to develop networks that learn to solve problems in a specified way. Rather than training neural networks only to maximise prediction accuracy, I train the networks while enforcing a constraint that the network’s behaviour interpretation matches our human expectations, with the goal of improving our ability to understand and trust neural networks. Finally, I explore an alternative training objective that seeks to replicate the effects of this guided training method but without the need for a predefined set of human expectations.

Files and links (1)

pdf

ThomasCraigMSc_r.pdfDownload View

Thesis Open Access

Metrics

248 File views/ downloads

203 Record Views

Details

Record Identifier: 9926480364301891
Title: Forcing Neural Networks to Behave Using Interpretability Methods
Creators: Craig Thomas
Contributors: Lech Szymanski (Advisor / Supervisor)
Theses and Dissertations: Master of Science - MSc, University of Otago
Academic Unit: Computer Science
Awarding Institution: University of Otago
Publisher: University of Otago
Date published ; e-published: 2022
Language: English
Resource Type; Subtype: Graduate Thesis/Dissertation
Format: application/pdf

Forcing Neural Networks to Behave Using Interpretability Methods

Abstract

Files and links (1)

Metrics

Details

Usage Policy