Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech

التفاصيل البيبلوغرافية
العنوان: Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech
المؤلفون: H. Kaymaz Keskinpala, Thaweesak Yingthawornsuk, D. Mitchell Wilkes, Richard Shiavi, Ronald M. Salomon
المصدر: INTERSPEECH
بيانات النشر: ISCA, 2007.
سنة النشر: 2007
مصطلحات موضوعية: Speech production, Formant, Computer science, Speech recognition, Cepstrum, Feature extraction, otorhinolaryngologic diseases, Feature selection, VOCAL PARAMETERS, Prosody, Speech processing, Emotional arousal, Vocal tract
الوصف: –Research has shown that the voice itself contains important information about immediate psychological state and certain vocal parameters are capable of distinguishing speaking patterns of speech signal affected by emotional disturbances (i.e., clinical depression). In this study, the GMM based feature of the vocal tract system response and spectral energy have been studied and found to be a primary acoustic feature set for separating two groups of female patients carrying a diagnosis of depression and suicidal risk. Index Terms : suicidal speech, depression, vocal tract, energy 1. Introduction Suicide is a common outcome in persons with serious mental disorders. However, it remains a phenomenon that is underresearched and poorly understood. Moreover, methods to help to identify persons who are at an elevated risk are sorely needed in clinical practice. This study represents an attempt to identify characteristic vocal patterns in persons with imminent suicidal potential which could lead to the development of new technology to aid in the assessment of suicidal potential. This project is to study vocal acoustic properties in suicidal states. Two study groups will be contrasted in this work: near–term suicidal and depressed. In the early 1980’s the Silvermans began to collect and analyze recorded suicide notes and interviews made shortly before suicide attempts. Their results suggested that voice can provide important information about immediate psychological state. They have described that the depressed patients have the same vocal speech as suicidal patients but the tonal quality of speech changes significantly when patients become suicidal. As reported in [1], [2], [3], the emotional arousal produces changes in the speech production scheme by affecting the respiratory, phonatory, and articulatory processes that in turn are encoded in the acoustic signal. The emotional content of the voice can be associated with acoustical variables such as thelevel, range, contour, and perturbation of the fundamental frequency, the distribution of energy in frequency spectrum, the location, bandwidth and intensity of formant frequencies, and a variety of temporal measures. The measurable change in vocal parameters affected by emotional disturbances is able to be evaluated by utilizing an appropriate speech processing approach associated with certain acoustic features. Researches have shown that depression has a major effect on the acoustic characteristics of voice when compared to the normal controls. Certain changes in acoustic properties of the affective speech are possibly specific to the near–term suicidal states in persons. In the published pilot studies [4], [6], analytical techniques have been developed to determine if subjects were in one of three mental states: healthy control, non–suicidal depressed, or high–risk suicidal. Several studies have used the vocal tract (VT) measures (i.e., formants) and prosody to classify the emotional disorders. France et. al [4] found the formants and percentages of total energy in frequency spectrum over a frequency range of 0–2,000 Hz to be the most distinguishing acoustic feature set for classifying groups of control, major depressed, and suicidal subjects. These features were recently re–investigated and extracted from a new speech database recorded in a better controlled environment. The experimental results have shown that the investigated feature set was still found as powerful acoustic discriminators in distinguishing suicidal, depressed and remitted patients [1]. Ozdas et. al [6] used a set of low order mel–cepstral coefficients to identify speakers who were diagnosed to be major depressed, suicidal, and normal by a psychiatrist. Her comparative result of classification performance as a measure of group separation was significantly high. Moore et al. compared the results of speaking pattern recognition by employing the prosody, formant and glottal ratio/spectrum in classifying normal controls and depressed patients. The optimal classifiers designated by the glottal ratio/spectrum and formant performed most effectively to separate two individual groups [7]. In this work, the characterization of the vocal tract system and distribution of energy in frequency spectrum of speech signal are focused. The speech processing algorithm to solve a specific problem of extracting the vocal features representing the characteristics of the VT system response is implemented and proposed. The estimate of smoothed magnitude spectrum is determined via the cepstrum analysis and the spectral structure contained in that magnitude spectrum is modeled by a mixture of Gaussian density components whose model parameters are estimated via a well–known “Expectation–Maximization” (EM) algorithm.This paper is organized as follows: Section 2 provides the descriptions of database, feature extraction, primary feature selection, and performance evaluation. Section 3 presents the results. Finally, section 4 concludes all findings from this work.
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_________::61ace4d40b73459670b1b2a3bc5d0c75Test
https://doi.org/10.21437/interspeech.2007-144Test
رقم الانضمام: edsair.doi...........61ace4d40b73459670b1b2a3bc5d0c75
قاعدة البيانات: OpenAIRE