Signal processing and acoustic modelling of speech signals for speech recognition systems

Waleed H Abdulla

Back

Doctoral Thesis

Signal processing and acoustic modelling of speech signals for speech recognition systems

Waleed H Abdulla

Doctor of Philosophy - PhD, University of Otago

03/2002

Handle:

https://hdl.handle.net/10523/1491

Abstract

man-machine interaction

automatic speech recognition

acoustic modelling

language modelling

signal processing strategies

critical band frequency analysis

speech recognition systems

T Technology (General)

Q Science (General)

Natural man-machine interaction is currently one of the most unfulfilled pledges of automatic speech recognition (ASR). The purpose of an automatic speech recognition system is to accurately transcribe or execute what has been said. State-of-the-art speech recognition systems consist of four basic modules: the signal processing, the acoustic modelling, the language modelling, and the search engine. The subject of this thesis is the signal processing and acoustic modelling modules. We pursue the modelling of spoken signals in an optimum way. The resultant modules can be used successfully for the subsequent two modules. Since the first order hidden Markov model (HMM) has been a tremendously successful mathematically established paradigm, which makes it the up-to-the-minute technique in current speech recognition systems, this dissertation bases all its studies and experiments on HMM. HMM is a statistical framework that supports both acoustic and temporal modelling. It is widely used despite making a number of suboptimal modelling assumptions, which put limits on its full potential. We investigate how the model design strategy and the algorithms can be adapted to HMMs. Large suites of experimental results are demonstrated to expound the relative effectiveness of each component within the HMM paradigm. This dissertation presents several strategies for improving the overall performance of baseline speech recognition systems. The implementation of these strategies was optimised in a series of experiments. We also investigate selecting the optimal feature sets for speech recognition improvement. Moreover, the reliability of human speech recognition is attributed to the specific properties of the auditory presentation of speech. Thus, in this dissertation, we explore the use of perceptually inspired signal processing strategies, such as critical band frequency analysis. The resulting speech representation called Gammatone cepstral coefficients (GTCC) provides relative improvement over the baseline recogniser. We also investigate multiple signal representations for recognition in an ASR to improve the recognition rate. Additionally, we developed fast techniques that are useful for evaluation and comparison procedures between different signal processing paradigms. The following list gives the main contributions of this dissertation: • Speech/background discrimination. • HMM initialisation techniques. • Multiple signal representation with multi-stream paradigms. • Gender based modelling. • Feature vectors dimensionality reduction. • Perceptually motivated feature sets. • ASR training and recognition packages for research and development. Many of these methods can be applied in practical applications. The proposed techniques can be used directly in more complicated speech recognition systems by introducing their resultants to the language and search engine modules.

Metrics

2851 Record Views

Details

Record Identifier: 9926478440001891
Title: Signal processing and acoustic modelling of speech signals for speech recognition systems
Creators: Waleed H Abdulla
Degree Awarded: Doctor of Philosophy - PhD
Project Type: Thesis - Doctoral
Academic Unit: Information Science
Awarding Institution: University of Otago
Date published ; e-published: 03/2002
Comment: The full-text of this item is not accessible via OUR Archive. Either the full-text file is not available or it is indefinitely restricted because the author has not granted permission for it to be open access. See Interloan information. If Interloan = yes, you may request this item item via your library Interloans service.
Interloan: yes
Resource Type ; Subtype: Doctoral Thesis