Signal processing and acoustic modelling of speech signals for speech recognition systems
dc.contributor.author  Abdulla, Waleed H  en_NZ 
dc.date.available  20110407T03:17:03Z  
dc.date.copyright  200203  en_NZ 
dc.identifier.citation  Abdulla, W. H. (2002, March). Signal processing and acoustic modelling of speech signals for speech recognition systems (Thesis). Retrieved from http://hdl.handle.net/10523/1491  en 
dc.identifier.uri  http://hdl.handle.net/10523/1491  
dc.description.abstract  Natural manmachine interaction is currently one of the most unfulfilled pledges of automatic speech recognition (ASR). The purpose of an automatic speech recognition system is to accurately transcribe or execute what has been said. Stateoftheart speech recognition systems consist of four basic modules: the signal processing, the acoustic modelling, the language modelling, and the search engine. The subject of this thesis is the signal processing and acoustic modelling modules. We pursue the modelling of spoken signals in an optimum way. The resultant modules can be used successfully for the subsequent two modules. Since the first order hidden Markov model (HMM) has been a tremendously successful mathematically established paradigm, which makes it the uptotheminute technique in current speech recognition systems, this dissertation bases all its studies and experiments on HMM. HMM is a statistical framework that supports both acoustic and temporal modelling. It is widely used despite making a number of suboptimal modelling assumptions, which put limits on its full potential. We investigate how the model design strategy and the algorithms can be adapted to HMMs. Large suites of experimental results are demonstrated to expound the relative effectiveness of each component within the HMM paradigm. This dissertation presents several strategies for improving the overall performance of baseline speech recognition systems. The implementation of these strategies was optimised in a series of experiments. We also investigate selecting the optimal feature sets for speech recognition improvement. Moreover, the reliability of human speech recognition is attributed to the specific properties of the auditory presentation of speech. Thus, in this dissertation, we explore the use of perceptually inspired signal processing strategies, such as critical band frequency analysis. The resulting speech representation called Gammatone cepstral coefficients (GTCC) provides relative improvement over the baseline recogniser. We also investigate multiple signal representations for recognition in an ASR to improve the recognition rate. Additionally, we developed fast techniques that are useful for evaluation and comparison procedures between different signal processing paradigms. The following list gives the main contributions of this dissertation: • Speech/background discrimination. • HMM initialisation techniques. • Multiple signal representation with multistream paradigms. • Gender based modelling. • Feature vectors dimensionality reduction. • Perceptually motivated feature sets. • ASR training and recognition packages for research and development. Many of these methods can be applied in practical applications. The proposed techniques can be used directly in more complicated speech recognition systems by introducing their resultants to the language and search engine modules.  en_NZ 
dc.subject  manmachine interaction  en_NZ 
dc.subject  automatic speech recognition  en_NZ 
dc.subject  acoustic modelling  en_NZ 
dc.subject  language modelling  en_NZ 
dc.subject  signal processing strategies  en_NZ 
dc.subject  critical band frequency analysis  en_NZ 
dc.subject  speech recognition systems  en_NZ 
dc.subject.lcsh  T Technology (General)  en_NZ 
dc.subject.lcsh  Q Science (General)  en_NZ 
dc.title  Signal processing and acoustic modelling of speech signals for speech recognition systems  en_NZ 
dc.type  Thesis  en_NZ 
dc.description.version  Unpublished  en_NZ 
otago.bitstream.pages  23  en_NZ 
otago.date.accession  20070216  en_NZ 
otago.school  Information Science  en_NZ 
thesis.degree.discipline  Information Science  en_NZ 
otago.interloan  yes  en_NZ 
otago.openaccess  Abstract Only  
dc.identifier.eprints  542  en_NZ 
otago.school.eprints  Information Science  en_NZ 
dc.description.references  Abdulla, W. H. and N. K. Kasabov (1999a). Speech recognition enhancement via robust CHMM speech background discrimination. Proc. ICONIP/ANZIIS/ANNES'99 International Workshop, New Zealand. Abdulla, W. H. and N. K. Kasabov (1999b). Two pass hidden Markov model for speech recognition systems. Proc. ICICS'99, Singapore. Abdulla, W. H. and N. K. Kasabov (1999c). The concepts of hidden Markov model in speech recognition. IJCNN'99, N. K. Kasabov, W. H. Abdulla (Ed.), Washington, DC, July 1016, Chapter 4. Abdulla, W. H. and N. K. Kasabov (2000). Feature selection for parallel CHMM speech recognition systems. Proc. of the Fifth Joint Conference on Information Sciences, vol.2, pp 874878, Atlantic City, New Jersey, USA. Abdulla, W. H. and N. K. Kasabov (2001). Improving speech recognition performance through gender separation. Proc. Artificial Neural Networks and Expert Systems International Conference (ANNES), pp 218 222, Dunedin, New Zealand. Abramson, N. (1963). Information Theory and Coding. New York, McGrawHill. Abrash, V., H. Franco, M. Cohen, N. Morgan, et al. (1992). "Connectionist gender adaptation in hybrid neural network/hidden Markov model speech recognition system." Proc. ICSLP'92. Aertsen, A. and P. Johannesma (1980). "Spectratemporal receptive fields of auditory neurons in the grass frog. I. Characterization of tonal and natural stimuli." Biol. Cybern. 38: 223  234. Allen, J. B. (1995). Speech and hearing in communication. New York, ASA edition, Acoustical Society of America. Alsabti, K., S. Ranka and V. Singh (1999). An efficient Kmeans clustering algorithm. IPPS/SPDP Workshop on High Performance Data Mining, San Juan, Puerto Rico. Arai, T. and S. Greenberg (1998). Speech intelligibility in the presence of cross channel spectral asynchrony. Proc. IEEE ICASSP'98. Atal, B. S. (1972). "Automatic speaker recognition based on pitch contours." J. Acoust. Soc. Am. 52: 16871697. Ata, B. S. and L. R Rabiner (1976). "A pattern recognition approach to voicedunvoicedsilence classification with application to speech recognition." IEEE Trans. ASSP 24(4): 201212. Bahl, L. R., P. F. Brown, P. V. de Souza and R L. Mercer (1988a). Speech recognition with continuous parameter hidden Markov models. Proc. IEEE ICASSP'88, New York, NY. Bahl, L. R., P. F. Brown, P. V. de Souza, R. L. Mercer, et al. (1988b). Acoustic Markov models used in Tangora speech recognition system. Proc. IEEE ICASSP'88, New York, USA. Baker, J. K. (1975a). "The DRAGON system  an overview." IEEE Trans. ASSP 23: 2429. Baker, J. K. (1975b). "The Dragon system  an overview." IEEE Trans. ASSP 23(1): 2429. Baker, J. K., Ed. (1975c). Stochastic modeling for automatic speech understanding. Speech Recognition: Invited paper presented at the 1974 IEEE symposium. New York, Academic Press. BarnwellIII, T. P. (1980). A comparison of parametrically different objective speech quality measures using correlation analysis with subjective quality results. Proc. IEEE ICASSP'80, Denver. Bateman, D. C., D. K. Bye and M. J. Hunt (1992). Spectral contrast normalization and other techniques for speech recognition in noise. Proc. IEEE ICASSP'92, San Francisco, USA. Baum, L. E. (1972). "An inequality and associated miximization technique in statistical estimation for probabilistic functions of Markov processe." Proc. Symp. On Inequalities 3: 17. Baum, L. E. and J. A. Egon (1967). "An inequality with applications to statistical estimation for probabilistic functions of Markov process and to a model for ecology." Bull. Amer. Meteorol. Soc. 73: 360363. Baum, L. E. and T. Petrie (1966). "Statistical inference for probabilistic functions of finite state Markov chains." Ann. Math. Stat. 37: 15541563. Baum, L. E., T. Petrie, G. Soules and N. Weiss (1970). "A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains." Annals of Mathematical Statistics 41(1): 164171. Becchetti, C. and L. P. Ricotti (1999). Speech recognition theory and C++ implementation, John Wiley & Sons. Bellegarda, J. and D. Nahamoo (1989). Tied mixture continuous parameter models for large vocabulary isolated speech recognition. Proc. ICASSP'89, Glasgow, Scotland. Bellegarda, J. and D. Naharnoo (1990). "Tied mixture continuous parameter modeling for speech recognition." IEEE Trans. ASSP 38(12): 20332045. Bengio, S. and Y. Bengio (2000a). "Taking on the curse of dimentionality in joint distributions using neural networks." IEEE Trans. Neural Networks 11(3): 550557. Bengio, Y. (1996). Neural Networks for Speech and Sequence Processing, International Thomson Computer Press. Bengio, Y. and S. Bengio (2000b). Modeling highdimensional discrete data with multilayer neural networks. Advances in Neural Information Processing Systems 12. S. A. Jolla, T. K. Leen and K.R. Miler, MIT Press: 400406. Bin, J., T. Calhurst, A. EIJaroudi, R. lyer, et al. (1999). Recent experiments in large vocabulary conversational speech recognition. Proc. IEEE ICASSP'99, Phoenix. Blimes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Berkeley, CA, International Computer Science Institute. Bocchieri, E. and B. Mak (1997). Subspace distribution clustering for continuous observation density hidden Markov models. Proc. Eurospeech. Boll, S. F. (1979). "Suppression of acoustic noise in speech using spectral subtraction." IEEE Trans. ASSP 27(2): 113120. BouGhazale, S. E. and A. 0. Asadi (2000). Handsfree voice activation of personal communication devices. Proc. IEEE ICASSP'2000, Istanbul, Turkey. Bourlard, H., S. Bengio and K. Weber (2001). New approaches towards robust and adaptive speech recognition. Advances in Neural Information Processing Systems 13. T. K. Leen, T. G. Dietterich and V. Tresp, MIT Press: 751757. Bourlard, H., S. Dupont and C. Ris (1996). Multistream speech recognition. Bourlard, H. and N. Morgan (1993). Connectionist Speech Recognition. A Hybrid Approach. Boston, Kluwer Academic Publishers. Bourlard, H. and N. Morgan (1994). Connectionist Speech Recognition, Kluwer Academic Publishers. Bourlard, H., C. J. Wellekens and H. Ney (1984). Connected digit recognition using vector quantization. Proc. IEEE ICASSP'84, San Diego, USA. Burton, D. K. and J. E. Shore (1985). "Speakerdependent isolated word recognition using speakerindependent vector quantization coodbooks augmented with speakerspecific data." IEEE Trans. ASSP 33(2): 440443. Chan, A. K. and S. J. Liu (1998). Wavelet Toolware: Software for Wavelet Training, Academic Press. Chen, S. S. and R. A. Gopinath (2001). Gaussianization. Advances in Neural Information Processing Systems 13. T. K. Leen, T. G. Dietterich and V. Tresp, MIT Press: 821827. Chow, Y. L., M. 0. Dunham, 0. A. Kimball, M. A. Krasner, et al. (1987). BYBLOS: The BBN continuous speech recognition system. Proc. IEEE ICASSP'87, Dallas, USA. Cooke, M. (1993). Modelling Auditory Processing and Organization. U.K., Cambridge University Press. Cosi, P., Ed. (1999). Auditory modeling and neural networks. Speech Processing, Recognition and Artificial Neural Networks. London, Springer Verlag. Cover, T. M. (1977). "On the possible ordering in the measurement selection problem." IEEE Trans. on Systems, Man, and Cybernetics 7(9): 657661. Crochiere, R. E. and L. Rabiner (1983). Multirate Digital Signal Processing. Englewood Cliffs, NJ, Prentice Hall. Das, S., R. Bakis, A. Nadas, D. Nahamoo, et al. (1993). Influence of background noise and microphone on the performance of the IBM TANGORA speech recognition system. Proc. IEEE ICASSP'93. Das, S., A. Nadas, D. Nahamoo and M. Picheny (1994). Adaptation techniques for ambience and microphone compensation in the IBM Tangora speech recognition system. Proc. IEEE ICASSP'94, Adelaide, Australia. Daubechies, I. (1990). "The wavelet transform, timefrequency localization and signal analysis." IEEE Trans. IT 36(5): 961 1005. Daubechies, I., Ed. (1992a). Ten Lectures on Wavelets. CBMSNSF Regional Conference Series in Applied Mathematics. Philadelphia, Pennsylvania, SIAM Press. Daubechies, I. (1992b). Ten lectures on wavelets, SIAM. Davis, B. and P. Mermelstein (1980). "Comparison of parametric representations for monosylabic word recognirion in continuously spoken sentences." IEEE Trans. ASSP 28(4): 357366. Davis, S. B., Ed. (1990). Comparison of parametric representations for monosyllabic word recognition in continuous spoken sentences. Readings in Speech Recognition. deBoer, E. and H. R. de Jongh (1978). "On cochlea encoding: Potentialities and limitations of the reversecorrelation technique." J. Acoust. Soc. Am. 63(1): 115 135. deBoer, E. and P. Kuyper (1968). "Triggered correlation." IEEE Trans. Biomed. Eng. BME15: 169  179. DeIler, J. R., J. G. Proakis and J. H. Hansen (1993). DiscreteTime Processing of Speech Signals. New York, Macmillan Publishing. Dempster, A. P., N. M. Laird and D. B. Rubin (1977). "Maximum likelihood from incomplete data via the EM algorithm." J. Royal Statist. Soc. Ser. B 39(1): 138. Devijver, P. A. and J. Kittler (1982). Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ, PrenticeHall. Donoho, D. L., Ed. (1993). Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data. Proceedings of the Symposia in Applied Mathematics, American Mathematical Society. Donoho, D. L. (1995). "Denoising by softthresholding." IEEE Trans. IT 41(3): 613627. Donoho, D. L. and I. M. Johnstone (1994). "Ideal spatial adaptation by wavelet shrinkage." Biometrika 81: 425455. Draper, N. R. and H. Smith (1981). Applied Regression Analysis. New York, Wiley. Duda, R. 0. and P. E. Hart (1973). Pattern Classification and Scene Analysis. New York, Wiley. Duncan Luce, R. (1993). Sound & Hearing A Conceptual Introduction, Lawrence Erlbaum Associates. Elenius, K. and M. Blomberg (1982). Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system. Proc. ICASSP'86. Ellermann, C., S. V. Even, C. Huang and L. Manganaro (1993). "Dragon systems' experiences in small to large vocabulary multilingual speech recognition applications." Proc. Eurospeech 3: 20772080. Ellis, D. (2001). TANDEM acoustic modeling in largevocabulary recognition. Proc. ICASSP'2001, Salt Lake City. Ellis, D. and J. A. Bilmes (2000). Using mutual information to design feature combinations. Proc. ICSLP2000, Beijing. Ellis, D. P. (2000a). Improved recognition by combining different features and different systems. Proc. AVIOS2000, San Jose. Ellis, D. P. (2000b). Using mutual information to design feature combinations. Proc. ICSLP2000, Beijing. Ellis, D. P., R. Singh and S. Sivadas (2001). Tandem acoustic modelling in largevocabulary recognition. Proc. IEEE ICASSP'2001, Salt Lake City. Elman, J. L. (1990). "Finding structure in time." Cognitive Science 14(2): 179211. Ferguson, J. D. (1980). Hidden Markov Analysis: An Introduction. Princeton, NJ, Institute of Defence Analyses. Fermin, C. D. (2000). Very Detailed Tutorial on the ear, http://www1.omi.tulane.edu/departments/pathology/fermin/Hearing.html. 6000. Forney, G. D. (1973). "The Viterbi algorithm." Proc. IEEE 61: 268278. Furui, S. (1986a). Speaker independent isolated word recognition based on emphasised spectral dynamics. Proc. ICASSP'86, Tokyo  Japan. Furui, S. (1986b). Speaker independent isolated word recognition based on emphasized spectral dynamics. Proc. IEEE ICASSP'86, Tokyo Japan. Furui, S. (1986c), "Speaker independent isolated word recognition using dynamic features of speech recognition." IEEE Trans. ASSP 34(2): 5259. Furui, S. (1986d). "Speakerindependent isolated word recognition using dynamic features of speech spectrum." IEEE Trans. ASSP 34: 5259. Furui, S. (1988). "A VObased preprocesor using cepstral dynamic features for speakerindependent large vocabulary word recognition." IEEE Trans. ASSN 36(7): 980987. Garofolo, J. S., L. F. Lame!, W. M. Fisher, J. G. Fiscus, et al. (1990). DARPA TIM IT AcousticPhonetic Continuous Speech Corpus CDROM. NISTIR 4930. Gauvain, J.L. and C.H. Lee (1994). "Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains." IEEE Trans. SAP 2(2): 291298. Ghitza, 0., Ed. (1992). Auditory nerve representation as a basis for speech processing. Advances in Speech Signal Processing. New York, Marcel Dekker. Glasberg, B. R. and B. C. Moore (1990). "Derivation of auditory filter shapes from notchednoise data." Hearing Research 47: 103  108. Glinski, S. C. (1985). "On the use of vector quantization for connecteddigit recognition." The AT & T Tech, J. 64(5): 10331045. Gong, Y. (1995). "Speech recognition in noisy environments: A survey." Speech Communication 16: 261291. Graps, A. (1995). "An introduction to wavelets." IEEE Computational Science and Engineering 2(2). Gravier, G., M. Sigelle and G. Chollet (1998). "Marrkov random field modelling for speech recognition." Australian J. of Intelligent Information Processing Systems 5(4): 245251. Gray, J. and J. D. Markel (1976). "Distance measures for speech processing." IEEE Trans. ASSP 24(5): 380391. Gray Jr, A. H. and J. D. Markel (1974). "A spectral flatness measure for studying the autocorrelation method of linear prediction of speech analysis." IEEE Trans. ASSP ASSP22: 607  217. Gray, R. M. (1984). "Vector quantization." IEEE ASSP Magazine 1(2): 4  29. Gray, R. M., A. Buzo, J. Gray and Matsuyama (1980). "Distortion measures for speech processing." IEEE Trans. ASSP 68(4): 367376. Greenberg, S., T. Arai and R. Silipo (1998). Speech derivation from exceedingly sparse spectral information. Proc. IEEE ICSLP'98. Greenwood, D. (1961). "Critical bandwidth and the frequency coordinates of the basilar membrane." J. Acoust. Soc. Am. 33: 1344  1356. Greenwood, D. (1990). "A cochlear frequencyposition function for several species29 years later." J. Accost, Soc. Am, 87(6): 2592  2605. Gupta, V. N., M. Lennig and P. Mermelstein (1987a). Integration of acoustic information in a large vocabulary word recognizer. Proc. ICASSP'87, Dallas, USA. Gupta, V. N., M. Lenning and P. Mermelstein (1987b). Integration of acoustic information in a large vocabulary word recognizer. Proc. IEEE ICASSP'87. HaebUmbach, R. (1999a). Investigations on interspeaker variability in the feature space. Proc. ICASSP'99, Phoenix, Arizona. HaebUmbach, R. (1999b). Investigations on interspeaker variability in the feature space. Proc. IEEE ICASSP'99, ArizonaUSA. Hansen, J. H. and B. L. Pellom (1998). An effective quality evaluation protocol for speech enhancement algorithms. Proc. ICSLP'98, Sydney, Australia. Hanson, B. A. and T. H. Applebaum (1990a). Features for noiserobust speakerindependent word recognition. Proc. Int. Conf. Spoken Language Processing (ICSLP), KobeJapan. Hanson, B. A. and T. H. Applebaum (1990b). Robust speaker independent word recognition using static, dynamic, and acceleration features: experiments with lombard and noisy speech. Proc. IEEE ICASSP'90, Albuquerque, NM. Hanson, B. A. and T. H. Applebaum (1990c). Robust speaker independent word recognition using static, dynamic, and acceleration features: experiments with lombard and noisy speech. Proc. ICASSP'90, Albuquerque, NM. Hanson, B. A., T. H. Applebaum and J.C. Junqua (1996a). Spectral dynamics for speech recognition under adverse conditions. Automatic Speech and Speaker Recognition Advanced Topics. C.H. Lee, F. K. Soong and K. K. Paliwal. Hanson, B. A., T. H. Applebaum and J.C. Junqua, Eds. (1996b). Spectral dynamics for speech recognition under adverse conditions. Automatic Speech and Speaker Recognition, Kluwer Academic Publishers. Harborg, E. (1990). Hidden Markov Models Applied to Automatic Speech Recognition,. PhD Thesis, Norwegian Institute of Technology (Trondheim). Harrington, J. and S. Cassidy (1999). Techniques in Speech Acoustics, Kluwer Academic Publishers. Hartmann, W. M. (1998). Signals, Sound, and Sensation, Springer Verlag. Herrnansky, H. (1990a). "Perceptual linear predictive (PLP) analysis for speech." J. Acoust. Soc. Am. 87: 17381752. Hermansky, H. (1990b). "Perceptual linear predictive (PLP) analysis of speech." J. Acoust. Soc. Am. 87(4): 17381752. Hermansky, H. (1997). Should recognizers have ears? Proc. ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, France. Hermansky, H. (1999). Analysis in automatic recognition of speech. Speech Processing, Recognition and Artificial Neural Networks. G. Chollet, D. Di Benedetto, A. Esposito and M. Marinaro, SpringerVerlag: 115137. Hermansky, H., D. Ellis and S. Sharma (2000). TANDEM connectionist feature extraction for conventional HMM systems. Proc. ICASSP2000, Istanbul. Hermansky, H. and N. Malayath (1998). Spectral basis functions from discriminant analysis. ICSLP'98, Sydney, Australia. Hermansky, H. and S. Sharma (1999). Temporal patterns (TRAPS) in ASR of noisy speech. Proc. ICASSP'99, Phoenix, AZ. Hertz, J., A. Krogh and R G. Palmer (1991). Introduction To The Theory Of Neural Computing, AddisonWesley Publishing Company. Hess, W. (1983). Pitch determination of speech signals. New York, SpringerVerlag. Hochberg, M., S. Rentals, A. J. Robinson and G. D. Cook (1995). Recent improvements to the ABBOT large vocabulary CSR system. Proc. IEEE ICASSP'95. Huang, L.S. and C.h. Yang (2000). A novel approach to robust speech endpoint detection in car environment. Proc. IEEE ICASSP'2000, Istanbul, Turkey. Huang, X., A. Acero, F. Allova, M. Y. Hwang, et al. (1995). "Microsoft windows highly intelligent speech recognizer: Whisper." Proc. IEEE ICASSP'95 1: 9397. Huang, X. D. (1992). Minimizing speaker variation effects for speakerindependent speech recognition. Proceedings of Speech and Natural Language Workshop. Huang, X. D., M. A. Ariki and M. A. Jack (1990a). Hidden Markov Models for Speech Recognition. Edinburgh, Edinburgh University Press. Huang, X. D., Y. Ariki and M. A. Jack (1990b). Hidden Markov Models for Speech Recognition, Edinburgh University Press. Huang, X. D., H. W. Hon, M. Y. Huang and K. F. Lee (1993). "A comparative study of discrete semicontinuous, and continuous hidden Markov models." Computer Speech and Language 7: 359368. Huang, X. D. and M. A. Jack, Eds. (1990). Semicontinuous hidden Markov models for speech signals. Readings in Speech Recognition, Morgan Kaufmann. Huang, X. D., K.F. Lee, H. W. Hon and M. Y. Hwang (1991). Improved acoustic modeling for the SPHINX speech recognition system. Proc. IEEE ICASSP'91, Toronto, Canada. Humphries, J. J. and P. C. Woodland (1997). Using accentspecific pronounciation modelling for improved large vocabulary continuous speech recognition. Proc. Eurospeech. Hunt, M. J., S. M. Richardson, D. C. Bateman and A. Piau (1991). An investigation of PLP and IMELDA acoustic representations and of their potential for combination. Proc. IEEE ICASSP'91, Toronto, Canada. Huo, Q., C. Chan and C.H. Lee (1995). "Bayesian adaptive learning of the parameters of hidden Markov models for speech recognition." IEEE Trans. SAP 3(5): 334345. Hwang, M. Y. (1993). Subphonetic acoustic modeling for speaker independent continuous speech recognition, CMU. Hwang, M. Y. and X. D. Huang (1992). Subphonetic modeling with Markov statessenone. Proc. IEEE ICASSP'92. Itakura, F. (1975a). "Minimum prediction residual principal applied to speech recognition." IEEE Trans. ASSP 23(1): 6772. Itakura, F. (1975b). "Minimum prediction residual principle applied to speech recognition." IEEE Trans. ASSP 23(1): 7672. Janin, A., D. Ellis and N. Morgan (1999a). Multistream speech recognition: Ready for prime time? Proc. Eurospeech'99, Budapest. Janin, A., D. P. Ellis and N. Morgan (1999b). Multistream speech recognition ready for prime time? Proc. Eurospeech, Budapest. Jankowski Jr., C. R, H.D. H. Vo and A. P. Lippmann (1995). "A comparison of signal processing front ends for automatic word recognition." IEEE Trans. Speech and Audio Processing 3(4): 286  293. Jankowskijr, C. R. (1995). "A comparison of signal processing front ends for automatic word recognition." IEEE Trans. SAP 3(4): 286293. Jayant, N. 0. S. and P. Noll (1984). Digital coding of waveforms, Prentice Hall. Jelinek, F. (1976). "Continuous recognition by statistical methods." Proc. IEEE 64: 532555. Jelinek, F. (1998). Statistical Methods for Speech Recognition, The MIT Press. Jelinek, F., L. R. Bahl and R. L. Mercer (1975). "Design of a linguistic statistical decoder for the recognition of continuous speech." IEEE Trans. on Information Theory 21: 250256. Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. Proceedings of The Eighth Annual Conference of the Cognitive Science Society, Amherst, MA. Jordan, M. I., Ed. (1989). Serial Order: A Parallel, Distributed Processing Approach. Hillsdale: Erlbaum. Juang, B. H. (1991). "Speech recognition in adverse environments." Computer Speech & Language 5: 275294. Juang, B. H., L. R. Rabiner, S. E. Levinson and M. M. Sondhi (1985). Recent developments in the application of hidden Markov models to speaker independent isolated word recognition. Proc. IEEE ICASSP'85, Tampa. Kadambe, S. and G. F. BoudreauxBartels (1992). "Application of the wavelet transform for pitch detection of speech signals." IEEE Trans. Information Theory 38(2): 917  964. Kailath, T. (1967). "The divergence and Bhattacharyya distance measure in signal selection." IEEE Trans. COM 15: 5260. Kasabov, N. K. (1996). Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. Cambridge, MIT Press. Kingsbury, B. E. D. (1999). Perceptuallyinspired signal processing strategies for robust speech recognition in reverberant environments. PhD. Thesis, University of California, Berkeley, CA. Fullback, S. (1959). Information Theory and Statistics. New York, Dover. Lamel, L. F., L. R. Rabiner, A. E. Rosenberg and J. G. Wilpon (1981). "An improved end points detector for isolated word recognition." IEEE Trans. ASSP 29(4): 777785. Lee, C.H. and J.L. Gauvain (1993). "Speaker adaptation based on MAP estimation of HMM parameters." Proc. IEEE ICASSP'93: 652655. Lee, K.F. (1989). Automatic Speech Recognition, Kluwer Academic Publishers. Lee, K.F., H. W. Hon, M. Y. Hwang, S. Mahajan, et al. (1989). The Sphinx speech recognition system. Proc. IEEE ASSP'89, Glasgow. Levinson, S. E., L. R. Rabiner and M. M. Sondhi (1983). "An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition." Bell Sys. Tech. J. 62(4): 10351074. Linde, Y., A. Buzo and R. M. Gray (1980). "An algorithm for vector quantizer design." IEEE Trans. Commun. 28(1): 8495. Liu, F., H, R. M. Stern, A. Acero and P. J. Moreno (1994). Environment normalization for robust speech recognition using direct cepstral comparison. Proc. IEEE ICASSP'94, Adelaid, Australia, Lockwood, P., C. Baillargeat, J. M. Gillot, J. Boudy, et al. (1991). "Noise reduction for speech enhancement in cars: Nonlinear spectral subtraction  Kalman filtering." Proc. Eurospeech. Lockwood, P. and J. Boudy (1992). "Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars." Speech Communication 11(23): 215 228. Looney, C. G. (1997). Pattern Recognition Using Neural Networks, Oxford University Press. Lyon, R. F. (1997). Allpole models of auditory filtering. Diversity in auditory mechanics. Lewis. Singapore, World Scientific Publishing: 205  211. Makhoul, J. (1975). "Linear Prediction: A Tutorial Review." Proceedings of the IEEE 63(4): 561  580. Makhoul, J. and Cossel (1976). LPCW: An LPC vocoder with linear predictive spectral warping. Proc. ICASSP°76, Philadelphia, USA. Makhoul, J., S. Roucos and H. Gish (1985). "Vector quantization in speech coding." Proceedings of the IEEE 73(11): 15511588. Mallat, S. (1989). "A theory for multiresolution signal decomposition: the wavelet representation." IEEE Trans. PAMI 11(7): 674  693. Manning, C. D. (1999). Foundation of Statistical Natural Language Processing, The MIT Press. Markel, J. D. and A. H. Gray Jr. (1976). Linear prediction of speech, SpringeVerlag. McLachlan, G. J. and T. Krishnan (1997). The EM Algorithm and Extensions, John Wiley & Sons, Inc. Meilijson, I. (1989). "A fast improvement to the EM algorithm on its own terms." J. Royal Statist. Soc. Ser. B 51(1): 127138. Mermelstein, P., Ed. (1976). Distance measures for speech recognition, psychological and instrumental. Pattern Recognition and Artificial Intelligence. Ming, J. and F. J. Smith (2000). A probabilistic union model for subband based robust speech recognition. Proc. IEEE ICASSP2000, Istanbul, Turkey. Misiti, M., Y. Misiti, G. Oppenheim and J. M. Poggi (1996). Wavelet Toolbox, Math Works Inc. Mokbel, C. and G. Collet (1991). Speech recognition in adverse environments: speech enhancement and spectral transformations. Proc. IEEE ICASSP'91, Toronto, Canada. Moon, T. K. (1996). "The expectation maximization algorithm." Signal Processing 13(6): 4760. Moon, T. K. and W. C. Stirling (2000). Mathematical Methods and Algorithms for Signal Processing, PrenticeHall, inc. Moore, B. C. (1977). Introduction to the Psychology of Hearing. Baltimore, Md., University Park Press. Moore, B. C. J. and B. R. Glasberg (1983). "Suggested formulae for calculating auditory filter bandwidths and excitation patterns." J. Acoust. Soc. Am. 74(3): 750  753. Morgan, N. and H. Bourlard (1995). "Continuous speech recognition: an introduction to the hybrid HMM/connectionist approach." Signal Processing Magazine May: 2542, Murphy, K. M. (1996). Sound and auditory system lecture, http://www.science.mcmastenca/Psychology/psych2e03/lecture9/sound.au dsys.lectu re. html. 2000. Myers, C., L. R. Rabiner and A. E. Rosenberg (1980). "Performance tradeoffs in dynamic time warping algorithms for isolated word recognition." IEEE Trans. ASSP 28(6): 623635. Nadas, A. (1983). "A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood." IEEE Trans. ASSP 31: 814817. Ney, H. (1990). Experiments on mixturedensity phoneme modeling for the speaker of independent 1000word speech recognition DARPA task. Proc. IEEE ICASSP'90, Albuquerque, NM. Nguyen, L. and R. M. Schwartz (1997). Efficient 2pass Nbest decoder. Proc. Eurospeech. Normandin, Y. (1996). Maximum mutual information estimation of hidden Markov models. Automatic Speech and Speaker Recognition Advanced Topics. C.H. Lee, F. K. Soong and K. K. Paliwal. Oppenheim, A. V. and R. W. Shafer (1989). Digital signal processing, Prentice Hall. O'Shaughnessy, D. (1987). Speech communication: human and machines. New York, USA, Addison Wesely. Owens, F. J. (1993a). Signal processing of speech, Macmillan. Owens, F. J. (1993b). Signal Processing of Speech, Macmillan Press Ltd., London. Padmanabhan, M., L. R. Bahl, D. Nahamoo and P. V. de Souza (1997). Decision tree based quantization of the feature space of a speech recognizer. Proc. Eurospeech. Paliwal, K. K. (1992). "Dimensionality reduction of the enhanced feature set for the HMMbased speech recognizer." Digital Signal Processing 2: 157173. Papamichalis, P. (1987). Practical approaches to speech coding. New Jersey, USA, PrenticeHall, Inc., Englewood Cliffs. Patterson, R. D. (1976). "Auditory filter shapes derived with noise stimuli." J. Acoust. Soc. Am. 59: 640  654. Patterson, R D. (1994). "The sound of a sinusoid: Spectral models." J. Acoust. Soc. Am. 96(3): 1409  1418. Patterson, R. D., M. H. Allerhand and C. Giguere (1995). "Timedomain modelling of peripheral auditory processing: A modular architecture and a software platform." J. Acoust. Soc. Am. 98: 18901894, Patterson, R. D., I. NimmoSmith, D. L. Weber and R. Milroy (1982). "The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram and speech threshold." J. Acoust, Soc. Am. 72: 1788  1803. Paul, D. B. (1990). "Speech recognition using hidden Markov models." The Lincoln Laboratory Journal 3: 4162. Pickles, J. 0. (1988). An introduction to the physiology of hearing. New York, Academic Press. Picone, J. W. (1993). "Signal modeling techniques in speech recognition." Proceedings of the IEEE 81(9): 1215  1247. Polikar, R. (1999). The wavelet tutorial, http://www.public.iastate.edu/rpolikar/WAVELETS. Poritz, A. B. (1988). "Hidden Markov models: A guided tour." Proc. ICASSP'88 1: 713. Poritz, A. B. and A. G. Richter (1986). On hidden Markov models in isolated word recogntion. Proc. IEEE ICASSP'86, Tokyo. Price, R. (1958). "A useful theorem for nonlinear devices having Gaussian inputs." IRE Trans. Inf. Theory IT4: 69  72. Pruzansky, S. (1964). "Talker recognition procedure based on analysis of variance." J. Acoust. Soc. Am. 36: 20412047. Purvis, M. (2001). Performance improvement due to multistreaming. Information Science Deptartment, University of Otago, New Zealand (Personal Communication). Quackenbush, S. R., T. P. Barnwell III and M. A. Clements (1988a). Objective Measures of Speech Quality, PrenticeHall, Englewood Cliffs, NJ. Quackenbush, S. R., T. P. BarnwellIII and M. A. Clements (1988b). Objective Measures of Speech Quality. NJ, Prentice Hall. Rabiner, L. (1977). "On the use of autocorrelation analysis for pitch detection." IEEE Trans. ASSP 25(1): 24  33. Rabiner, L., M. J. Cheng and A. E. Rosenberg (1976). "Acomparative performance study of several pitch detection algorithms." IEEE Trans. ASSP 24(5): 399  417. Rabiner, L. and B.H. Juang (1993a). Fundamentals of speech recognition. New Jersey, Prentice Hall. Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition." Proc. IEEE 77(2): 257286. Rabiner, L. R. and B. H. Juang (1986). "An introduction to hidden Markov models." IEEE ASSP Magazine 3(1): 416. Rabiner, L. R. and B. H. Juang (1993b). Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, New Jersey. Rabiner, L. R. and B.H. Juang (1993c). Fundamentals of Speech Recognition. Englewood Cliffs, NJ, Prentice Hall. Rabiner, L. R., B. H. Juang, S. E. Levinson and M. M. Sondhi (1985a). "Recognition of isolated digits using hidden Markov models with continuous mixture densities." AT & T Technical Journal 64(6): 12111234. Rabiner, L. R., B. H. Juang, S. E. Levinson and M. M. Sondhi (1985b). "Recognition of isolated digits using hidden Markov models with continuous mixture densities." The AT & T Tech. J. 64(6): 12111233. Rabiner, L. R., S. E. Levinson and M. M. Sondhi (1983). "On the application of vector quantization and hidden Markov models to speakerindependent isolated word recognition." The Bell System Technical Journal 62(4): 10751105. Rabiner, L. R., K. C. Pan and S. F. K. (1984a). "On the performance of isolated word speech recognizers using vector quantization and temporal energy contours." The AT & T Tech. J. 63(7): 12451260. Rabiner, L. R. and M. R. Sambur (1975). "An algorithm for determining the endpoints of isolated utterances." Bell Syst. Tech. J. 54(2): 297315. Rabiner, L. R., M. M. Sondhi and S. E. Levinson (1984b). "A vector quantizer combining energy and LPC parameters and its application to isolated word recognition." The AT & T Tech. J. 63(5): 721735. Rabiner, L. R., J. G. Wilpon and B. H. Juang (1986). "A modelbased connecteddigit recognition system using either hidden Markov models or templates." Computer Speech & Language 1(1): 167197. Ravishankar, M., R. Bisiani and E. Thayer (1997). "Subvector clustering to improve memory and speech performance of acoustic likelihood computation." Proc. Eurospeech 151154. Rumelhart, D. E., G. E. Hinton and R. J. Williams, Eds. (1986). Learning Internal Representation by Error Propagation. Parallel Distributed Processing: Exploration in the Microstructures of Cognition. Cambridge, MA, MT Press. Saito, S. and K. Nakata (1985). Fundamentals of speech signal processing, Academic Press. Sakoe, H. and S. Chiba (1978a). "Dynamic programming algorithm optimization for spoken word recognition." IEEE Trans. ASSP 26: 4349. Sakoe, H. and S. Chiba (1978b). "Dynamic programming algorithm optimization for spoken word recognition." IEEE Trans. ASSP 26(1): 4349. Sambur, M. R. (1975). "Selection of acoustic features for speaker identification." IEEE Trans. ASSP 23: 176182. Schalkoff, R. J. (1992). Pattern Recognition: Statistical, Structural and Neural Approaches, John Wiley & Sons, Inc. Schroeder, M. R. (1999). Computer Speech: Recognition, Compression, Syntehsis, SpringerVerlag. Scott, D. W. (1992). Multivariate Density Estimation. New York, Wiley. Seneff, S. (1986). A computational model for the peripheral auditory system: application to speech recognition research. Proc. IEEE ICASSP'86. Sharma, S., D. P. Ellis, S. Kajarekar, P. Jain, eta!. (2000). Feature extraction using nonlinear transformation for robust speech recognition on the Aurora database. Proc. IEEE ICASSP'2000, Istanbule. Shin, W.H., B.S. Lee, Y.K. Lee and J.S. Lee (2000), Speech/nonspeech classification usin multiple features for robust endpoint detection. Proc. IEEE ICASSP'2000, Istanbul, Turkey. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall. Slaney, M. (1988). Lyon°s cochlear model, Apple Computer Inc. Slaney, M. (1993). An efficient implementation of the Patterson Holdsworth auditory filter bank, Apple Computer Inc. Slaney, M. and R. F. Lyon (1990). A perceptual pitch detector. ICASSP°1990. Smervuo, P. (1996). Speech recognition using context vectors and multiple feature streams. MSc Thesis, Helsinki University of Technology. Smith, L. S., Ed. (1997). Extracting features from the shortterm time structure of cochlear filtered sound. Proceedings of the Fourth Neural Computation and Psycology Workshop: Connectionist Representations. London, SpringerVerlag. Solbach, L. (1998). An architecture for robust partial tracking and onset localization in single channel audio signal mixes. PhD thesis, Technical University of HamburgHarburg (Germany). Strange, G. and T. Nguyen (1996). Wavelets and Filter Banks. Wellesley  MA, Wellesley  Cabridge Press. Tchorz, J., K. Kasper, H. Reininger and B. Kollmeier (1997). On the interplay between auditorybased and locally recurrent neural networks for robust speech recognition in noise. Proc. Eurospeech°97, ESCA, Patras, Greece. Tchorz, J., M. Kleinschmidt and B. Kollmeier, Eds. (2001). Noise suppression based on neurophysiologicallymotivated SNR estimation for robust speech recognition. Advances in Neural Information Processing Systems  13, MIT Press. Tchorz, J. and B. Kollmeier (1999). A psychoacoustical model of the auditory periphery as front end for ASR. Proc. ASA/EAA/DEGA Joint Meeting on Acoustics, Berlin. Tibrewala, S. and H. Hermansky (1997). Multistream approach in acoustic modelling. Proc. LVCSRHub5 Workshop. Tou, J. T. and R. C. Gonzalez (1974). Pattern Recognition Principles, AddisonWesley Publishing Company. Trentin, E. (2001). Robust Combination of Neural Networks and Hidden Markov Models for Speech Recognition. PhD Thesis, University of Florence (Italy). Tribolet, J. M., L. R. Rabiner and M. M. Sondhi (1979). "Statistical properties of an LPC distance measure." IEEE Trans. ASSP 27(5): 550 558. Tsakalidis, S., V. Digalakis and L. Neumeyer (1999). Efficient speech recognition using subvector quantization and discretemixture HMMs. Proc. IEEE ICASSP'99, Phoenix, Arizona. Urnesh, S., L. Cohen and D. Nelson (1999). Fitting the Mel scale. Proc. IEEE ICASSP'99, Phoenix, Arizona. Valens, C. (1999). A realy friendly guide to wavelets, http://perso.wanadoo.fr/polyvalens/clemens/clemens.html. Valtchev, V. (1995). Discriminative Methods in HMMbased Speech Recognition. PhD Thesis, University of Cambridge. Wand, M. P. and M. C. Jones (1995). Kernel Smoothing, Chapman & Hall. Warren, R. M. (1982). Auditory Perception: A New Synthesis, Pergamon Press. Wilpon, J. G. and L. R. Rabiner (1985). "A modified Kmeans clustering algorithm for use in isolated word recognition." IEEE Trans. ASSP 33(3): 587594. Wilpon, J. G., L. R. Rabiner and T. B. Martin (1984). "An improved word detection algorithm for telephone quality speech incorporating both syntactic and semantic constraints." AT & T Tech. J. 63(3): 479498. Wolf, J. J. (1972). "Efficient acoustic parameters for speaker recognition." J. Acoust. Soc. Am. 51: 20442056. Wolpert, D. H. (1992). "Stacked generalization." Neural Networks 5(2): 241259. Woodland, P. C., M. J. F. Gales and D. Pye (1996). Improving environmental robustness in large vocabulary speech recognition. Proc. ICASSP'96. Wu, C. F. J. (1983). "On the convergence properties of the EM algorithm." The Annals of Statististics 11(1): 95103. Wu, S., B. Kingsbury, N. Morgan and S. Greenberg (1998). Performance improvements through combining phone and syllablelength information in automatic speech recognition. Proc. ICSLP98, Sydney. Young, S. (1996). "Large vocabulary continuous speech recognition: a review.". Young, S., D. Kershaw, J. Odell, D. 011ason, et al. (2000). The HTK Book (version 3.0), Microsoft Corporation. Zhu, Q. and A. Alwan (2000). On the use of variable frame rate analysis in speech recognition. Proc. IEEE ICASSP'2000, Istanbul, Turkey. Zweig, G. and S. Russell (1998). "Probabilistic modeling with Bayesian networks for automatic speech recognition." Australian J. of Intelligent Information Processing Systems 54(4): 253259. Zwicker, E. (1961). "Subdivision of the audible frequency range into critical bands." J. Acoust. Soc. Am. 33: 248.  en_NZ 
Files in this item
Files  Size  Format  View 

There are no files associated with this item. This item is not available in fulltext via OUR Archive. If you would like to read this item, please apply for an interlibrary loan from the University of Otago via your local library. If you are the author of this item, please contact us if you wish to discuss making the full text publicly available. 
This item appears in the following Collection(s)

Information Science [477]

Thesis  Doctoral [2524]