Show simple item record

dc.contributor.advisorDeng, Da
dc.contributor.advisorCranefield, Stephen
dc.contributor.authorSimmermacher, Christianen_NZ
dc.identifier.citationSimmermacher, C. (2005, November 23). Detecting instruments in classical music: A view on timbre, musical features, and classification (Dissertation, Master of Science). Retrieved from
dc.description.abstractThis thesis describes research on recognition of classical instruments from isolated tones and musical passages. The intention is to evaluate the robustness of the features and the efficiency of the data analysis methods to classify instruments in a dynamic environment. The information extracted from the samples is based on relevant works in instrument detection and timbre studies, and on knowledge about instrument characteristics. For a better understanding of the perceptual derived features, the human auditory system is briefly reviewed and some perceptual phenomena are mentioned. Different feature extraction methods are chosen without applying sound source separation techniques. These methods are based on auditory models and filters and are widely used for simulation of the human perception and speech detection. Furthermore, a standardised set of features from the MPEG-7 Instrument timbre description scheme is calculated from the harmonic spectrum of a sound. It is especially designed to capture the timbre of instrument tones. Two experiments are discussed in this thesis. The first experiment is used to evaluate features and classification methods on a dataset of single tone samples from 20 instruments. A combination of perceptual features shows the highest accuracy with around 90% and a reduced collection of selected features proves most efficient. The second experiment is similar to the first, but it processes musical passages to distinguish between instruments from four major instrument families. Spectral features are predominantly used for this task. Nineteen features combined from the Mel-frequency cepstral coefficients and the MPEG-7 standard achieve an accuracy of around 94%. In both experiments a multilayer perceptron shows the best generalisation for the test set. The dataset from the second experiment is also used for a user interface that implements the most robust feature extraction technique and the fastest data analysis method. It detects the dominant instrument in a ten second long passage and annotates the calculated information in XML-format.en_NZ
dc.subjectclassical instrumentsen_NZ
dc.subjectisolated tonesen_NZ
dc.subjectdata analysisen_NZ
dc.subjectdistinguish between instrumentsen_NZ
dc.subjectMel-frequency cepstral coefficienten_NZ
dc.subject.lcshQ Science (General)en_NZ
dc.subject.lcshM Musicen_NZ
dc.titleDetecting instruments in classical music: A view on timbre, musical features, and classificationen_NZ
otago.schoolInformation Scienceen_NZ Scienceen_NZ of Science of Otagoen_NZ Dissertationsen_NZ
otago.openaccessAbstract Only
dc.identifier.eprints335en_NZ Systems Research Laboratoryen_NZ Scienceen_NZ
dc.description.referencesISO/IEC R1e5f9e3r8e-n4c (e2,001). Information Technology - Multimedia Content Description Interface - Part 4: Audio. ISO/IEC FDIS 15938-4:2001(E). ANSI (1973). American National Psychoacoustical Terminology 53.20-1973. New York, American Standards Association. Agostini, G, Longari, M, Pollastri, E (2003). Musical instrument timbres classification with spectral features. Journal on Applied Signal Processing, vol. 1, pp. 5-14. Alpaydin, E (2004). Introduction to machine learning. Cambridge, MA London, England, MIT Press. pp. 161-164, 218-225. Berthold, M and Hand, DJ (1999). Intelligent data analysis: an introduction. Berlin, Heidelberg, New York, Springer-Verlag. pp. 269-319. Bishop, CM (1995). Neural networks for pattern recognition. Oxford University Press. pp. 332-349. Bregman, AS (1999). Auditory scene analysis: the perceptual organization of sound. Cambridge, MA; London, England, MIT Press. pp. 1-45, 455-528, 641-683. Brown, JC (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America. vol. 89, no. 1, pp. 425-434. Brown, JC, Houix, 0, McAdams, S (2001). Feature dependence in the automatic identification of musical woodwind instruments. Journal of the Acoustical Society of America. vol. 109, no. 3, pp. 1064-1072. Burges, C (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. vol. 2, pp. 121-167. Cosi, P, DePoli, G, Lauzzana, G (1994). Auditory modelling and self-organizing neural networks for timbre classification. Journal of New Music Research. vol. 23, pp. 71-98. Davis, B and Mermelstein, P (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing. vol. 28, no. 4, pp. 357-366. De Poli, G and Prandoni, P (1997). Sonological models for timbre characterization. Journal of New Music Research. vol. 26, pp. 170-197. Downie, JS (2003). Music information retrieval. Annual review of information science and technology. Information Today. vol. 37, pp. 295-340. Eggink, J and Brown, GJ (2004). Instrument recognition in accompanied sonatas and concertos. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada. pp. 217-220. Eronen, A (2001). 'Automatic musical instrument recognition'. Master of ScienceThesis. Department of Information Technology, Tampere University of Technology. pp. 47-49. Eronen, A and Klapuri, A (2000). Musical instrument recognition using cepstral coefficients and temporal features. In Proceedings of the IEEE International 80 Conference on Acoustics, Speech, and Signal Processing. Istanbul, Turkey. pp. 753- 756. Essid, S, Richard, G, David, B (2004a). Efficient musical instrument recognition on solo performance music using basic features. In Proceedings of the Audio Engineering Society 25th International Conference. London, UK. URL: [Accessed 22.11.2005]. Essid, S, Richard, G, David, B (2004b). Musical instrument recognition based on class pairwise feature selection. In Proceedings of the 5 th International Conference on Music Information Retrieval. Barcelona Spain. TJRL : [Accessed 22.11.2005]. Foote, J (1999). An overview of audio information retrieval. Multimedia Systems. vol. 7, no. 1, pp. 2-10. Fraser, A and Fujinaga, I (1999). Toward realtime recognition of acoustic musical instruments. In Proceedings of the International Computer Music Conference. Beijing, China. pp. 175-177 Fujinaga, I (1998). Machine recognition of timbre using steady-state tone of acoustic musical instruments. In Proceedings of the International Computer Music Conference. Michigan, USA. pp. 207-210. Fujinaga, I and MacMillan, K (2000). Realtime recognition of orchestral instruments. In Proceedings of the International Computer Music Conference. Berlin, Germany. pp 141-143. Gerhard, D (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-6. Department of Computer Science, University of Regina. Godsmark, D and Brown, GJ (1999). A blackboard architecture for computational auditory scene analysis. Speech Communication. vol. 27, pp. 351-366. Goto, M (2004). A predominant-FO estimation method for polyphonic musical audio signals. In Proceedings of the 18 th International Congress on Acoustics. Kyoto, Japan. pp. 1085-1088. Grey, JM (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America. vol. 61, no. 5, pp. 1270-1277. Grimaldi, M, Cunningham, P, Kokaram, A (2003). An evaluation of alternative feature selection strategies and ensemble techniques of classifying music. Technical Report TCD-CS-2003-21, Computer Science Department, Trinity College Dublin. Gruhne, M (2004). Network. MPEG 7-Audio current work. 3rd Workshop of the Interactive Music Munich, Germany. URL,: [Accessed 22.11.2005]. Herre, J (2003). MPEG-7 Audio: Tools for Semantic Audio Description and Processing. 6th International Conference on Digital Audio Effects. London, UK. URL: http ://www. el ec. qmul [Accessed 22.11.2005]. Houtsma, AIM (1997). Pitch and timbre: Definition, meaning and use. Journal of New Music Research. vol. 26, pp. 104-115. Iverson, P. and Krumhansl, CL (1993). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America. vol. 94, pp. 2595-2603. Jiang, DN, Lu, L, Zhang, HJ, Tao, JH, Cai, LH (2002). Music type classification by spectral contrast feature. Technical Report. Department of Computer Science and Technology, Tsinghua University, China and Microsoft Research. Kaminskyj, I (1999). Multidimensional scaling analysis of musical instruments sounds' spectra. In Proceedings of the Australasian Computer Music Conference. Wellington, New Zealand. pp. 36-39. Kaminskyj, I (2000). Multi-feature musical instrument sound classifier. In Proceedings of the Australasian Computer Music Conference. Brisbane, Australia. pp. 53-62. Kaminskyj, I and Materka, A (1995). Automatic source identification of monophonic musical instrument sounds. In Proceedings of the IEEE International Conference on Neural Networks. Perth, Australia. pp. 189-194. Kashino, K and Murase, H (1999). A sound source identification system for ensemble music based on template adaption and music stream extraction. Speech Communication. vol. 27, pp. 337-349. Kinoshita, T, Sakai, S, Tanaka, H (1999). Musical sound source identification based on frequency component adaption. In Proceedings of the Workshop on Computational Auditory Scene Analysis. Stockholm, Sweden. pp. 18-24. Krumhansl, CL (1989). Why is musical timbre so hard to understand? Structure and Perception of Electroacoustic Sound and Music. Amsterdam, Elsevier. pp. 43-53. Langner, G (1997), Temporal processing of pitch in the auditory system. Journal of New Music Research. vol. 26, pp. 116-132. Leman, M, Lesaffre, M, Tanghe, K (2001). Introduction to the IPEM toolbox for perception based music analysis. In Proceedings of the XIII Meeting of the MVO Research Scociety on Foundations of Music Research. Ghent, Belgium. URL: [Accessed 28.02.2006] Li, ZN and Drew, MS (2004). Fundamentals of multimedia. Pearson Prentice Hall, London; Upper Saddle River, N.J., Pearson Education, pp. 126-163. Livshin, AA and Rodet, X (2004). Musical instrument identification in continuous recordings. In Proceedings of the 7th International Conference on Digital Audio Effects. Naples, Italy. URL: [Accessed 30.01.2006] Lu, G (1999). Multimedia database management systems. Boston, MA, Artech House. pp. 18-27, 105-128. Marozeau, J, de Cheveigne, A, McAdams, S, Winsberg, S (2003). The dependency of timbre on fundamental frequency. Journal of the Acoustical Society of America. vol. 114, pp. 2946-2957. Marques, J and Moreno, PJ (1999). A study of musical instrument classification using gaussian mixture models and support vector machines. Technical Report Series CRL 99/4, Compaq Computer Corporation; Cambridge Research Laboratory. Martin, KD (1998). Musical instrument identification: A pattern recognition approach. Presented at the 136th Meeting of the Acoustical Society of America. Norfolk, USA. URL: [Accessed 22.11.2005]. Martin, KD (1999). 'Sound-source recognition: A theory and computational model'. Doctor of Philosophy Thesis. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. pp. 100-1 02. Martinez, JM (2004). MPEG-7 overview (version 10). ISO/IEC JTC 1/SC 29/WG 11. McAdams, S, Winsberg, S, Donnadieu, S, de Soete, G, Krimphoff, J (1995). Perceptual scaling of synthesized musical timbres: common dimensions, specificities and latent subject classes. Psychological Research. vol. 58, no. 3, pp. 177-192. McCulloch, WS and Pitts, W (1943). A logical calculus of the ideas immenent in nervous activity. Bulletin of Mathematical Biophysics. vol. 5, pp. 115-133. Niemann, H (1983). Klassifikation von 11/1ustern. Berlin, Heidelberg, New York, Springer-Verlag. pp. 346-351. Nyquist, H (1928). Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers. vol. 47, pp. 617-644. Owens, FJ (1993). Signal processing of speech. Basingstoke, Macmillan. pp. 53-60. Patterson, RD (2000). Auditory images: How complex sounds are represented in the auditory system. Journal of the Acoustical Society of Japan (E). vol. 21, no. 4, pp. 183- 190. Peeters, G, McAdams, 5, Herrera, P (2000). Instrument sound description in the context of MPEG-7. In Proceedings of the International Computer Music Conference. Berlin, Germany. URL: http ://recherche .ircam .fr/equipes/analy se synthese/peeters/ARTICLES/Peeters_2000JCMC Timbre VEG7 .pdf [Accessed 22.11.2005]. Rabiner, LR (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE. vol. 77, no. 2, pp. 257-286. Rossing, TD (1990). The science of sound. Reading, Mass., Addison-Wesley Pub. Co. pp. 63-165, 169-254, 287-305. Scheirer, ED (2000). 'Music-listening systems'. Doctor of Philosophy Thesis. School of Architecture and Planning, Program in Media Arts and Sciences, Massachusetts Institute of Technology. Slaney, M (1998). Auditory toolbox (version2). Technical Report No. 1998-010, Interval Research Corporation. Smith, SW (1997). The scientist and engineer's guide to digital signal processing, California Technical Publishing. pp. 141-168. van Immerseel, L and Martens, JP (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustical Society of America. vol. 91, no. 6, pp. 3511-3526. Virtanen, T and Klapuri, A (2001). Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, USA. pp. 83-86. von Bekesy, G (1960). Experiments in hearing. New York, McGraw-Hill. von Helmholtz, HLF (1954). On the sensation of tone. New York, Dover, Dover Publications. von Hornbostel, EM and Sachs, C (1914). Systematik der Musikinstrumente. Zeitschrift ffir Ethnologie. vol. 46, pp. 553-590. Webb, AR (2002). Statistical Pattern Recognition. West Sussex, England New Jersey, John Wiley & Sons, Ltd. pp. 93-104, 204-216. Wessel, D (1979). Timbre space as musical control structure. Computer Music Journal. vol. 3, no. 2, pp. 45-52. Witten, IH and Frank, E (2000). Data mining: Practical machine learning tools and techniques with Java implementations. San Francisco, CA, Morgan Kaufmann Publishers. pp. 72-75, 114-116, 193-201. Yu, L and Liu, H (2003). Efficiently handling feature redundancy in high-dimensional data. In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining. Washington DC, USA. pp. 685-690. Zwicker, E and Terhardt, E (1980). Analytical expression for critical-band rate and critical bandwidth as a function of frequency. Journal of the Acoustical Society of America. vol. 68, no. 5, pp. 1523-1525.en_NZ
 Find in your library

Files in this item


There are no files associated with this item.

This item is not available in full-text via OUR Archive.

If you would like to read this item, please apply for an inter-library loan from the University of Otago via your local library.

If you are the author of this item, please contact us if you wish to discuss making the full text publicly available.

This item appears in the following Collection(s)

Show simple item record