Evolving systems for connectionist-based speech recognition
|dc.identifier.citation||Kilgour, R. (2003, June 18). Evolving systems for connectionist-based speech recognition (Thesis, Doctor of Philosophy). Retrieved from http://hdl.handle.net/10523/1481||en|
|dc.description||xv, 519 p. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "June 18, 2003".|
|dc.description.abstract||Although studied for several years, speech recognition is still a field that is developing. Recently several important researchers have pointed out areas within the field that need to be addressed. These include robustness to various environments, large or expandable vocabularies, user-friendliness, high recognition accuracy and the ability to recognise continuous speech. The ability to adapt is an important component of a speech recognition system. People new to the system should have the benefits mentioned above. The system should also manage recognition of different speaking rates. Also, novel environments may cause a drop in the system's performance if it lacks robustness or the ability to adapt. A common target for speech recognition algorithms is to detect the presence of speech units, commonly phonemes. This approach involves grouping speech sounds, or phones, into abstract groups that reflect meaning. Recently artificial neural networks have been applied to this task. Nevertheless, uncertainty and ambiguity are inherent in the neural network recognition process. Several novel techniques are proposed to aid in the recognition process, and to help to fulfil the requirements of a successful speech recognition system. The goal of this research is to investigate theories of speech and language processing that are relevant to speech recognition and spoken language understanding. These theories have their foundations in fields such as engineering, computer science, linguistics, natural language processing, psycholinguistics and psychology. An adaptive system is implemented to test the validity and usefulness of such work to the fields of speech recognition and spoken language understanding. For example, the development of abstract structures of the human auditory system and the auditory cortex are investigated, and applied towards better engineering methods for building adaptive speech and language systems. For the implementation of an adaptive speech recognition system, parameters are introduced that can be adjusted either manually or automatically. In this manner, the system can adapt to new speakers and environments. The architecture of the system is modular and hierarchical. Different methods are applied at various levels. For example, artificial neural networks are best suited for low-level processing. A discussion of how errors and uncertainty may be resolved in an unsupervised manner concludes the work. Ideally, the system will adapt to the situation, and the future occurrences of such phenomena may be reduced or eliminated.||en_NZ|
|dc.subject||adaptive speech and language systems,||en_NZ|
|dc.subject.lcsh||T Technology (General)||en_NZ|
|dc.subject.lcsh||Q Science (General)||en_NZ|
|dc.title||Evolving systems for connectionist-based speech recognition||en_NZ|
|thesis.degree.name||Doctor of Philosophy||en_NZ|
|thesis.degree.grantor||University of Otago||en_NZ|
|dc.description.references||Abu Hosan, R., Boucher, P., Brugnara, F., De Mori, R., Galler, M., and Snow, M. (1995). Acoustic modeling. Annual report 1995, Centre for Intelligent Machines, McGill University. Barras, C., Caraty, M., and Montacie, C. (1995). Temporal control and training selection for hmm based system. In Eurospeech 95. Bartlett, C. (1992). Regional variation in New Zealand English: The case of Southland. New Zealand English Newsletter, 6:5-15. Bayard, D. and Bartlett, C. (1996). "you must be from Gorrre": Attitudinal effects of Southland rhotic accents and speaker gender on NZE listeners and the question of NZE regional variation. Te Reo, 39:25-45. Bell, A. (1997). Those short front vowels. New Zealand English Journal, 11:3-13. Bengio, Y. (1999). Markovian models for sequential data. Neural Computing Surveys, 2:129-162. Bengio, Y., De Mori, R., and Cardin, R. (1990). Speaker independent speech recognition with neural networks and speech knowledge. In Touretzky, D. E., editor, Advances in Neural Information Processing Systems 2, pages 218-225. Morgan Kaufmann. Bergland, G. D. (1969). A guided tour of the fast fourier transformation. IEEE Spectrum, pages 41-52. Berndt, R. S., Caramazza, A., and Zurif, E. (1983). Language functions: Syntax and semantics. In Segalowitz, S. J., editor, Language Functions and Brain Organization, pages 5-28. Academic Press, New York. Bertoncini, J. B., Bijeljac-Babic, R., Jusczyk, P. W., Kennedy, J. L., and Mehler, J. (1988). An investigation of young infants' perceptual representations of speech sounds. Journal of Experimental Psychology: General, 117(1):21-33. Black, A. W. and Taylor, P. (1994). CHATR: A generic speech synthesis system. In COLING-94, volume 2, pages 983-986, Kyoto, Japan. Black, A. W., Taylor, P., and Caley, R. (1999). The Festival speech synthesis system. System Documentation Edition 1.3, University of Edinburgh. Brennan, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1993). Classification and Regression Trees. The Wadsworth statistics/probability series. Chapman & Hall, New York, NY. Burgess, N. (1994). A constructive algorithm that converges for real-valued input patterns. International Journal of Neural Systems, 5(1):59-66. Campbell, N. (1996). CHATR: A high-definition speech re-sequencing system. In Acoustical Society of America and Acoustical Society of Japan Third Joint Meeting. Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 3:698-713. Carpenter, G. A. and Tan, A. (1995). Rule extraction: From neural architecture to symbolic representation. Connection Science, 7(1):3-27. Cassidy, S. (1999). Compiling multi-tiered speech databases into the relational model: Experiments with the Emu system. In Proszeky, G., Nemeth, G., and Mandli, J., editors, EuroSpeech, volume 5, pages 2239-2242, Budapest, Hungary. Chang, J. and Glass, J. (1997). Segmentation and modeling in segment-based recognition. In Proc. Eurospeech 1997, pages 1199-1202. Chen, S. and Liao, Y. (1998). Modular recurrent neural networks for mandarin syllable recognition. IEEE Transactions on Neural Networks, 9(6):1430-1441. Clements, G. N. (1990). The role of the sonority cycle in core syllabification. In Kingston, J. and Beckman, M., editors, Papers in Laboratory Phonology I. Cambridge University Press, Cambridge. Cole, R., Hirschman, L., Atlas, L., Beckman, M., Biermann, A., Bush, M., Clements, M., Cohen, J., Garcia, 0., Hanson, B., Hermansky, H., Levinson, S., McKeown, K., Morgan, N., Novick, D., Ostendorf, M., Oviatt, S., Price, P., Silverman, H., Spitz, J., Waibel, A., Weinstein, C., Zahorian, S., and Zue, V. (1995). The challenge of spoken language systems research directions for the nineties. IEEE Transactions on Speech and Audio Processing, 3:1-21. Cole, R. A., Muthusamy, Y., and Fanty, M. A. (1990). The ISOLET spoken letter database. Technical Report 90-004, Oregon Graduate Institute. Craven, M. W. and Shavlik, J. W. (1993). Learning symbolic rules using artificial neural networks. In Proceedings of the Tenth International Conference on Machine Learning, pages 73-80, Amherst , MA. Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Cohen, W. W. and Hirsh, H., editors, Machine Learning: Proceedings of the Eleventh International Conference, San Francisco, CA. Morgan Kaufmann. Craven, M. W. and Shavlik, J. W. (1997). Using neural networks for data mining. Future Generation Computer Systems, 13(Special Issue on Data Mining):211-229. Craven, M. W. and Shavlik, J. W. (1999). Rule extraction: Where do we go from here? Working Paper 99-1, University of Wisconsin Machine Learning Research Group. Date, C. J. (1990). An Introduction to Database Systems, volume 1. Addison-Wesley, Reading, MA, 5 edition. Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357-366. Dehaene-Lambertz, G. and Baillet, S. (1998). A phonological representation in the infant brain. NeuroReport, 9(8):1885-1888. Deverson, T. (1990). `woman's consistancy': A distinctive zero plural in New Zealand English. Te Reo, 33:43-56. Dongxin, X., Taiyi, H., and Zhiwei, L. (1990). A hierarchical structure for feed-forward neural networks and its application to speaker-independent speech recognition. In 10th International Conference on Pattern Recognition. IEEE Computer Society Press. Elman, J. L. (1990). Finding structures in time. Cognitive Sciences, 14:179-211. Esparcia-Alcazar, A. I. and Sharman, K. C. (1996). Evolving recurrent neural network architectures by genetic programming. Technical Report CSC-96009, Centre for Systems and Control, University of Glasgow. Fahlman, S. E. and Lebiere, C. (1990). The cascade-correlation learning architecture. Technical Report CMU-CS-90-100, School of Computer Science, Carnegie Mellon University. Feldkamp, L. A., Puskorius, G. V., Yuan, F., and Davis, Jr., L. I. (1992). Architecture and training of a hybrid neural-fuzzy system. In international conference on Fuzzy Logic Neural Networks, pages 131-134, Iizuka, Japan. Fletcher, J. and Obradovic, Z. (1993). Combining prior symbolic knowledge and constructive neural network learning. Connection Science, 5(3 & 4):365-375. Foldi, N. S., Cicone, M., and Gardner, H. (1983). Pragmatic aspects of communication in brain damaged patients. In Segalowitz, S. J., editor, Language Functions and Brain Organization, pages 55-86. Academic Press, New York. Fowler, C. A., Best, C. T., and McRoberts, G. W. (1990). Young infants' perception of liquid coarticulatory influences on following stop consonants. Perception and Psychophysics, 48(6):559-570. Franzini, M. A., Witbrock, M. J., and Lee, K. (1989). Speaker-independent recognition of connected utterances using recurrent and non-recurrent neural networks. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pages 1-6. IEEE. Frean, M. (1990). The Upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2:198-209. Gallinari, P. (1995). Training of modular neural net systems. In Arbib, M., editor, The Handbook of Brain Theory and Neural Networks, pages 582-585. MIT Press. Garrett, M. (1994). The structure of language processing: Neuropsychological evidence. In Gazzaniga, M., editor, The Cognitive Neurosciences, pages 881-899. MIT press, Cambridge, MA. Ghada, N. A. and Mohamed, A. S. (2000). Evolution of recurrent cascade correlation networks with a distributed collaborative species. In The First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, page To Appear, San Antonio, TX,. Giegerich, H. J. (1992). English phonology: an Introduction. University Press. Gimson, A. C. and Cruttenden, A. (1994). Gimson's Pronunciation of English. Hodder and Stoughton, fifth edition. Goldberg, D. (1989). Genetic Algorithms is Search, Optimization and Machine Learning. Addison Wesley, New York. Gordon, E. and Deverson, T. (1985). New Zealand English. Heinemann. Gordon, E. and Maclagan, M. (1990). A longitudinal study of the 'ear'/'air' contrast in New Zealand speech. In Bell, A. and Holmes, J., editors, New Zealand ways of speaking English, pages 129-148. Multilingual Matters, Bristol. Gordon, E. and Maclagan, M. (1995). The changing sound of new zealand english. The New Zealand Speech-Language Therapists' Journal, 50:32-40. Gordon, E. and MacLagan, M. A. (1983). A study of the /ia/ - /ea/ contrast in New Zealnd English. The New Zealand Speech Language Therapists' Journal, November:16-26. Gordon, E. and Trudgill, P. (1999). Shades of things to come: Embryonic varients in New Zealand English sound changes. English World Wide, 21(1):111-124. Gori, M., Bengio, Y., and De Mori, R. (1989). BPS: A learning algorithm for capturing the dynamic nature of speech. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pages 417-423. IEEE. Haggo, D. (1984). Transcribing New Zealand English vowels. Te Reo, 27:63-67. Hampshire, J. B. and Waibel, A. (1990). Connectionist architectures for multi-speaker phoneme recognition. In Touretzky, D. E., editor, Advances in Neural Information Processing Systems 2, pages 203-210. Morgan Kaufmann. Handel, S. (1991). The physiology of listening. In Listening: An Introduction to the Perception of Auditory Events, pages 461-545. Bradford, second edition. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceeding of the IEEE, 66:51-83. Hertz, J., Krough, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley. Hieronymus, J. L. (1994). ASCII phonetic symbols for the world's languages: Worldbet. Technical report, AT&T. Hilario, M., Pellegrini, C., and Alexandre, F. (1994). Modular integration of connectionist and symbolic processing in knowledge-based systems. In Proc. ISIKNH'94: International Symposium on Integrating Knowledge and Neural Heuristics, pages 123-132, Pensacola, Florida. Holmes, J. (1995a). Three chairs for New Zealand English: the EAR/AIR merger. English Today, 11(3):14-18. Holmes, J. (1995b). Two for /t/: flapping and glottal stops in New Zealand English. Te Reo: Journal of the Linguistic Society of New Zealand, 38:53-72. Holmes, J. (1995c). The Wellington Corpus of Spoken New Zealand English: A progress report. New Zealand English Journal, 9:5-8. Holmes, J. (1996). The New Zealand spoken component of ICE: Some methodological challanges. In Greenbaum, S., editor, Comparing English Worldwide, pages 163-181. Oxford University Press, Oxford. Holmes, J. (1997). T-time in New Zealand. English Today, 13(3):18-22. Jankowski, N. and Kadirkamanathan, V. (1997b). Statistical control of growing and pruning in rbf-like neural networks. In In Third Conference on Neural Networks and Their Applications, pages 663-670, Kule, Poland. Jankowski, N. and Kadirkamanathan, V. (1997a). Statistical control of rbf-like networks for classification. In In 7th International Conference on Artificial Neural Networks, pages 385- 390, Lausanne, Switzerland. Springer-Verlag. Jensen, J. T. (1993). English Phonology. John Benjamin. Jordan, M. I. (1989). Serial order: A parallel, distributed processing approach. In Elman and Rumelhart, editors, Advances in Connectionist Theory: Speech. Erlbaum. Jusczyk, P. W., Friederici, A. D., Wessles, J. M. I., Svenkerud, V. Y., and Jusczyk, A. M. (1993). Infants' sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32:402-420. Kadous, M. W. (1999). Learning comprehensible descriptions of multivariate time series. In Proc. 16th International Conf. on Machine Learning, pages 454-463. Kasabov, N. (1994). Towards using hybrid connectionist fuzzy production systems for speech recognition. In Proceedings of the WWW, Nagoya, Japan. Kasabov, N. (1996a). Adaptable connectionist productionist systems. Neurocomputing, 13:95- 117. Kasabov, N. (1996b). Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. MIT Press, Cambridge, MA. Kasabov, N. (1996c). Learning and aproximate reasoning in fuzzy neural networks and hybrid systems. Fuzzy Sets and Systems, 82:135-149. Kasabov, N. (2000). Evolving fuzzy neural networks for supervised/unsupervised on-line, knowledge-based learning. IEEE Transactions on Man, Machine and Cybernetics; Part B: Cybernetics, To Appear. Kasabov, N., Kilgour, R., and Sinclair, S. (1999a). From hybrid adjustable neuro-fuzzy systems to adaptive connectionist-based systems for phoneme and word recognition. Fuzzy Sets and Systems, 103:349-367. Kasabov, N., Kim, J. S., Watts, M., and Gray, A. (1997). FuNN/2 - a fuzzy neural network architecture for adaptive learning and knowledge acquisition. Information Sciences - Applications, 101(3-4):155-175. Kasabov, N., Kozma, R., Kilgour, R., Laws, M., Watts, M., Gray, A., and Taylor, J. (1999b). Speech data analysis and recognition using fuzzy neural networks and self-organising maps. In Kasabov, N. and Kozma, R., editors, Neuro-Fuzzy Techniques for intelligent Information Systems, Studies in Fuzziness and Soft Computing, pages 241-263. Physica-Verlag, New York. Kasabov, N., Kozma, R., and Watts, M. (1998). Optimization and adaption of fuzzy neural networks through genetic algorithms and learning-with-forgetting methods and applications to phoneme based speech recognition. Information Sciences - Applications, 13. Kasabov, N., Sinclair, S., Kilgour, R., Watson, C., Laws, M., and Kasabova, D. (1995). Intelligent human to computer interfaces and the case study of building english-to-maori talking dictionary. In Kasabov, N. and Coghill, G., editors, Proceedings of ANNES '95, pages 294- 297. IEEE Computer Society Press. Kasabov, N., Watson, C., Sinclair, S., and Kilgour, R. (1994). Integrating neural networks and fuzzy systems for speech recognition. In Togneri, R., editor, Proceedings of the Fifth Australian International Conference on Speech Science and Technology, pages 462-467. Uniprint. Keegan, P. (1997). Kimikupu hou maori lexical database on the web: Reflections and possible future directions. In Proceedings NAMMSAT Conference, Massey University, Palmerston North. Kilgour, R. and Gray, A. (1997). A comparison of neural and statistical methods for phoneme recognition. In Proceeding of ICONIP (Addendum), pages 61-65. Kilgour, R. I. (1998). Hybrid Fuzzy Systems and Neural Networks for Speech Recognition. Masters thesis, University of Otago. Kilgour, R. I., Abdulla, W. H., and Kasabov, N. K. (1999). Using phoneme confusion and trigram matching to improve phoneme to grapheme transcription. In 6th International Conference on Neural Information Processing, page (Accepted), Perth. Kosko, B. (1992). Neural Networks and Fuzzy Systems. Prentice Hall. Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50(2):93- 107. Kuhl, P. K. (1994). Speech perception. In Ninifie, F., editor, Introduction to Communication Sciences and Disorders, pages 77-142. Singular Pub. Group, San Diago. Kuhn, R., Perronnin, F., Nguyen, P., Junqua, J., and Rigazio, L. (2001). Very fast adaptation with a compact context-dependent eigenvoice model. In ICASSP. Kwok, T. and Yeung, D. (1999). Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Transactions on Neural Networks, To Appear. Ladefoged, P. and Maddieson, I. (1996). The sounds of the world's languages. Blackwell, Cambridge, MA. Laws, M. (1997). Integrating text and speech into databases, information systems and knowledge engineering for human computer interaction. the progressive development of an integrated bilingual interface. In NAMMSAT '97, Massey University, Palmerston North. Laws, M. and Kilgour, R. (1998). MOOSE: Management of otago speech environment. In Proceedings of ICSLP '98, Sydney, Australia. Laws, M., Kilgour, R. I., and Watts, M. (2000). Analysis of the New Zealand English and Maori on-line translator. In Proceedings of JCIS 2000, volume 2, pages 848-851, Atlantic City, NJ. Laws, M. R. (1998). A Bilingual Speech Interface for New Zealand English to Maori. Masters thesis, University of Otago. Lee, S. J., Kim, C., Yoon, H., and Cho, J. W. (1991). Application of fully recurrent neural networks for speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 77-80. IEEE. Lee, T. and Ching, P. C. (1996). On improving discrimination capability of an RNN based recognizer. In ICSLP 96, volume 1, pages 526-529, Philadelphia. Leerink, L. R. and Jabri, M. (1993). Improved phoneme recognition using multi-module recurrent neural networks. In Proceedings of the Fourth Australian Conference on Neural Networks, pages 26-29. Leung, H. C., Glass, J. R., Phillips, M. S., and Zue, V. W. (1990). Phonetic classification and recognition using the multi-layer perceptron. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 525-528. IEEE. Lewis, C. (1996). The Origins of New Zealand English: A report on work in progress. New Zealand English Journal, 10:25-30. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6):431-461. Maclagan, M. and Gordon, E. (1999). Data for New Zealand social dialectology: the Canterbury Corpus. New Zealand English Journal, 13:50-58. Maclagan, M. A. (1998). Diphthongisation of /e/ in NZE: a change that went nowhere? New Zealand English Journal, 12:43-54. Mahoney, J. and Mooney, R. (1994). Modifying network architectures for certainty-factor rulebase revision. In Proceedings of the International Symposium on Integrating Knowledge and Neural Heuristics, pages 75-85, Pensacola, FL. Mahoney, J. J. and Mooney, R. J. (1993). Combining connectionist and symbolic learning to redine certainty factor rule bases. Connection Science, 5(3 & 4):339-364. Marean, G. C., Werner, L. A., and Kuhl, P. K. (1992). Vowel categorization by very young infants. Developmental Psychology, 28(3):396-405. Masters, T. (1995). Advanced Algorithms for Neural Networks: A C++ Sourcebook. John Wiley and Sons, New York, NY. Miller, G. A. (1981). Language and Speech. W. H. Freeman. Miller, G. A. and Nicely, P. E. (1955). An analysis of perceptual confusions among some english consonants. Journal of the Acoustical Society of America, 27:338-46. Mitra, S. and Pal, S. K. (1992). Rule generation and inferencing with a layered fuzzy neural network. In International Conference on Fuzzy Logic Neural Netoworks, pages 641-644, Iizuka, Japan. Mitra, S. and Pal, S. K. (1995). Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 1(1). Moore, B. C. J. (1982). Speech perception. In Introduction to the Psychology of Hearing, pages 206-231. Academic Press, New York, NY, second edition. Morgan, N. (1994). Using a million connections for continuous speech recognition. In Proceedings of International Conference on Neural Information Processing, pages 1439-1444. IEEE. Nddfanen, R., Lehtokoski, A., Lennes, M., Cheour, M., Houtilainen, M., livonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., and Alho, K. (1997). Languagespecific phoneme representations revealed by electric and magnetic brain responses. Nature, 385:432-434. Nag, A. and Ghosh, J. (1998). Flexible resource allocating network for noisy data. In SPIE Conference on Applications and Science of Computational Intelligence, volume 3390, pages 551-559, Orlando. Naval Research Laboratory (1976). Automatic translation of english text to phonetics by means of letter-to-sound rules. Technical Report NRL Report 7948, Naval Research Laboratory. Opitz, D. and Shavlik, J. (1995a). Dynamically adding symbolically meaningful nodes to knowledge-based neural networks. Knowledge-Based Systems, 8:301-311. Opitz, D. and Shavlik, J. (1995b). Using heuristic search to expand knowledge-based neural networks. In Petsche, T., Hanson, S., and Shavlik, J., editors, Computational Learning Theory and Natural Learning Systems, volume III. MIT Press. Opitz, D. and Shavlik, J. W. (1993). Heuristically expanding knowledge-based neural networks. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 1360 - 1365, Chambery, France. Pal, S. K. and Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transaction on Neural Networks, 3(5):683-697. Perekh, R., Yang, J., and Honavar, V. (1995). Constructive neural network learning algorithms for multi-category pattern classification. Technical Report TR95-15, Artificial Intelligence Research Group, Iowa State University. Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9):1215-1247. Platt, J. C. (1991). A resource allocation network for function interpolation. Neural Computation, 3(2):213-225. Price, P., Fischer, W., Berstein, J., and Pallet, D. (1988). The DARPA 1000-word resource management database for continuous speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 651-654, New York. Rabiner, L. R. (1989). A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedins of the IEEE, 77(2):257-286. Requena, I. and Delgado, M. (1992). R-FN: A model of fuzzy neuron. In International Conference on Fuzzy Logic & Neural Netoworks, pages 793-796, Iizuka, Japan. Roach, P., Sergeant, P., and Miller, D. (1992). Syllabic consonants at diferent speaking rates: A problem for automatic speech recognition. Speach Communication, 11:475-479. Rob, P. and Williams, T. R. (1995). Database Design and Applications Development with Microsoft Access 2.0. McGraw-Hill, San Francisco, CA. Rossen, M. L. and Anderson, J. A. (1989). Representational issues in a neural network model of syllable recognition. In International Joint Conference on Neural Networks, pages 19-25, San Diago, CA. Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press. Santos, R. T., Nievola, J. C., and Freitas, A. A. (2000). Extracting comprehensible rules from neural networks via genetic algorithms. In The First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, page To Appear, San Antonio, TX. Sarle, W. (1995). Why statisticians should not fart. unpublished paper, SAS Institute. Shavlik, J. (1992). A framework for combining symbolic and neural learning. In Honavar, V. and Uhr, L., editors, Artificial Intelligence and Neural Networks, Steps towards principled integration. Academic Press. Shavlik, J. W. (1996). An overview of research at wisconsin on knowledge-based neural networks. In Proceedings of the International Conference on Neural Networks, pages 65-69, Washington, DC. Shaw, A. and Mitchell, R. A. (1990). Phoneme recognition with a time-delay neural network. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pages 191-195. IEEE. Shepard, R. N. (1972). Psychological representation of speech sounds. In David, E. E. and Denes, P. B., editors, Human Communication: A Unified View. McGraw-Hill. Silverman, H. F. and Morgan, D. P. (1990). The application of dynamic programming to connected speech recognition. IEEE ASSP, 7(3):7-25. Sinclair, S. (1996). Development of an Isolated Speech Digit Recognition System Based on Backpropagation Neural Networks. Unpublished masters thesis, University of Otago. Sinclair, S. and Watson, C. (1995). The development of the Otago speech database. In Kasabov, N. and Coghill, G., editors, Proceedings of ANNES '95, pages 298-301. IEEE Computer Society Press. Stolcke, A. (1997). Linguistic knowledge and empirical methods in speech recognition. AI Magizine. Strom, N. (1997). A tonotopic artificial neural network for phoneme probability estimation. In Proc. 1997 IEEE Workshop on Speech Recognition and Understanding. Taha, and Ghosh, J. (1996). Three techniques for extracting rules from feedforward networks. In Dagli, Akay, Chen, Fernandez, and Ghosh, editors, Intelligent Engineering Systems Through Artificial Neural Networks, volume 6. ASME Press, St. Louis. Tan Keng Yan, C. (2000). Speaker Adaptive Phoneme Recognition Using Time Delay Neural Networks. Phd thesis, National University of Singapore. Taylor, J. (1998). Personal communication. Taylor, J., Kasabov, N., and Kilgour, R. I. (2000). Modelling the emergence of speech sound catagories in evolving connectionist systems. In Proceedings of JCIS 2000, volume 2, pages 844-847, Atlantic City, NJ. Towell, G. and Shavlik, J. (1992a). Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In Moody, J., Hanson, S., and Lippman, R., editors, Advances In Neural Information Processing Systems, volume 4. Morgan Kaufmann, San Mateo, CA. Towell, G. and Shavlik, J. (1992b). Using symbolic learning to improve knowledge-based neural networks. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 177-182, San Jose, CA. Towell, G. and Shavlik, J. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 69. Trehub, S. E. (1976). The discrimination of foreign speech contrasts by infants and adults. Child Development, 47:466-472. Tresp, V., Hollatz, J., and Ahmad, S. (1993). Network structuring and training using rule-based knowledge. In Advances in Neural Information Processing, volume 5, pages 871-878. Morgan Kaufmann. Unnikrishnan, K. P., Hopfield, J. J., and Tank, D. W. (1991). Connected-digit speakerdependant speech recognition with time-delayed connections. IEEE Transactions on Signal Processing, 39:698-713. Vine, B. (1999). A word on the wellington Corpora. New Zealand English Journal, 13:59-61. Waibel, A. (1989). Consonant recognition by modular construction of large phonemic time-delay neural networks. In Touretzky, D. E., editor, Advances in Neural Information Processing Systems 1, pages 215-223. Morgan Kaufmann. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing, 37:328-339. Warren, R. (1970). Perceptual restoration of missing speech sounds. Science, 167:392-393. Waters, C. (2000). A Speech Recognition System Utilising a Word-Graph Parser. Phd thesis, University of Auckland. Watson, C. I., Harrington, J., and Evans, Z. (1998). An acoustic comparison between New Zealand and Australian English vowels. Austrailian Journal of Linguistics, 18(2):185-207. Webster, D. B. (1992). An overview of mammalian auditory pathways with an emphasis on humans. In Webster, D. B., Popper, A. N., and Fay, R. R., editors, The Mammalian Auditory Pathway: Neuroanatomy, Springer Handbook of Auditory Research, pages 1-22. Springer- Verlag, New York. Wells, J. C. (1982). Accents of English 3: Beyond the British Isles. University Press. Werker, J. F. and Lalonde, C. E. (1988). Cross-language speech perception: Initial capabilities and developmental change. Developmental Psychology, 24(5):672-683. Wilpon, J. G., Mikkilieni, R. P., Roe, D. B., and Gokcen, S. (1990). Speech recognition: From the laboratory to the real world. ATandT Technical Journal, pages 14-23. Winkler, I., Kujala, T., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., Czigler, I., Csepe, V., Ilmoniemi, R. J., and Naatanen, R. (1999). Brain responses reveal the learning of forign language phonemes. Psychophysiology, 36:638442. Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87(9):1423-1447. Zavaliagok, G., Zaho, Y., Schwartz, R., and Makhoul, J. (1993). A hybrid neural net system for state-of-the-art continuous speech recsognition. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing, pages 704-711. Morgan Kaufmann. Zavaliagok, G., Zaho, Y., Schwartz, R., and Makhoul, J. (1994). A hybrid segmental neural net/hidden markov model system for continuous speech recsognition. IEEE Transactions on Speech and Audio Processing, 2(1):151-159. Zue, V., Seneff, S., and Glass, J. (1990). Speech database development at MIT: TIMIT and beyond. Speech Communication, 9:351-356.||en_NZ|
Files in this item
There are no files associated with this item.
This item is not available in full-text via OUR Archive.
If you would like to read this item, please apply for an inter-library loan from the University of Otago via your local library.
If you are the author of this item, please contact us if you wish to discuss making the full text publicly available.