Abstract
It has been a long standing technical challenge to create machines that can perform
human intellectual tasks such as speech processing. Speech recognition is important not
only because it is the most common means of human communication, but also because
in some cases, it is the most efficient way to interact with computers or other smart
devices. Despite great advances over recent years in the development of Speech
Recognition Systems (SRS), these system do not come close to human recognition and
thus speech recognition (in computer) remains an unsolved problem.
The main obstacle to building successful SRS in real-world environment, is lack of
robustness. SRS must work for as many people as possible, and should perform well
under everyday listening conditions. Differences in articulation, accents and speaking
cadence combine to form one of the more pervasive speech recognition problems. The
biggest challenge for SRS is a lack of adequate methods for handling intrinsic variations
in speech. A key human cognitive characteristic is the ability to learn and adapt to new
patterns. Thus, it is important to enable intelligent systems to learn and generalise even
from single instances or limited samples of data, so that new or changed signals (e.g.,
accented speech, noise) could be correctly understood. It has been well demonstrated
that adaptation in SRS is very beneficial.
Evolving Connectionist Systems (ECoS) are neural networks that evolve their
structure through incremental adaptive learning to recognise an input and/or output
streams of data. The ECoS paradigm was adopted for the first time, in this research in
order to develop novel algorithms designed to address the problem of the adaptation
SRS of new speakers. A case study was conducted using two sets of speakers from the
TIMIT corpus; speakers of the same dialect region (intra-accent) were adopted as the
baseline data and speakers of a different dialect region (inter-accent) as the adaptation
data. Comparative analysis of ECoS networks against Multi-Layer Perceptron (MLP)
and Fuzzy ARTMAP were undertaken. Simple Evolving Connectionist Systems
(SECoS) was shown to outperform other algorithms used in this study demonstrating
high generalisation while resistance to forgetting.
In order to demonstrate the generalisation and adaptivity of the SECoS networks a
further case study was undertaken using a small vocabulary adaptive word recognition
system. This was developed to control the navigation of a robot named ROKEL. The implementation facilitated on-line speaker adaptation. An adaptive connectionist
method was also developed to accurately determine the boundaries of speech and
non-speech segments within incoming speech signals. It allowed adaptation to
environmental background noise in order to correctly determine the boundaries of
spoken words.
There is a significant performance differences exist for noisy and clean speech data in
an otherwise identical task. The effect of various noise levels and conditions on speech
perception in an in-vehicle environment was investigated. It was hypothesised that
filtering noisy speech should improve performance of SRS because it improves speech
intelligibility, however, an evaluation of several widely used denoising techniques showed
that in general, SECoS performance was decreased over filtered speech. The ANOVA for
recognition results as a function of SNR and speed indicated that both SNR levels and
speed conditions have significant effect in recognition performance. The effect of
acceleration was not significant when considered independently from SNR. However, if
acceleration conditions create more (engine) noise, the recognition rate would decrease
due to the decreased SNR. The experimental analysis of the research work presented,
showed that the methods and algorithms developed throughput this thesis were viable.