Abstract
In recent years nonverbal vocalisation has become a modality of interest in the study of emotion communication. Previous literature has often characterised such vocalisations as expressions of internal states and suggested they communicate either basic emotions or features of core affect depending on their conception of emotion. Research using expression models has, however, generated conflicting findings on the nature and number of nonverbal emotional vocalisations. This thesis explores the hypothesis that these vocalisations are ethological signals, or displays that affect the behaviour of others, which have evolved in parallel with their response.
This hypothesis prompted previously under-explored research questions. First, how many emotions does the voice signal and how is such a signalling system organised? To address this question Experiment 1 collected vocalisations that aimed to capture the breadth of nonverbal emotional vocalisation. Prompt scenarios were designed in a Pilot Experiment to elicit a maximised list of target emotions with social contexts that included a potential responder. Participants agreed that the scenarios evoked the target emotions in a replicated forced-choice emotion matching task. Acoustic analysis of the vocalisations prompted by these scenarios revealed 30 emotions with significantly different vocal signals.
Second, what are the acoustic features of these vocal signals? Experiment 2 addressed this question by collecting and analysing a corpus of vocalisations for the 30 emotions. The acoustic features of F0, F0 standard deviation, formant frequencies one to three, HNR, intensity, jitter, and shimmer were found to be significant predictors of emotion. If nonverbal emotional vocalisations are ethological signals their acoustic features should be shaped by their ideal behavioural responses. I therefore predicted that signals seeking approach responses would be harmonic with high frequencies, whereas those seeking avoid responses would be noisy with low frequencies. The predictions for harmonicity and intensity were supported for many emotions, but the results for some frequency features were unexpected. Experiment 3 aimed to validate the corpus of vocalisations in a playback experiment, and participants successfully recognised the target emotions of the vocalisations above chance. Experiment 3 also identified a number of acoustic features significantly associated with participants’ successful identification of target emotions.
Third, how are vocal signals perceived? Experiment 4 aimed to investigate whether the vocalisations for six emotions were categorically perceived. Stepwise continua were created for each emotion by synthesising highly recognised vocalisations collected in Experiment 2, and transforming them each by an acoustic feature associated with recognition accuracy in Experiment 3. These stimuli were played to participants to observe if there were abrupt changes in their recognition across the continua. No evidence of categorical perception was found however original vocalisations were correctly recognised significantly more than all synthesised and transformed vocalisations, which suggested issues with the transformation process.
Together these findings indicated that there are a large number of distinct nonverbal vocalisations associated with emotion, and that their acoustic features are consistent with their ideal responses. I interpret this as support for the Signal Theory of Vocal Emotion, and discuss how this theory could be further interrogated.