Discrimination between singing and speech in real-world audio

التفاصيل البيبلوغرافية
العنوان: Discrimination between singing and speech in real-world audio
المؤلفون: Brian Thompson
المصدر: SLT
بيانات النشر: IEEE, 2014.
سنة النشر: 2014
مصطلحات موضوعية: Discriminative model, Computer science, Speech recognition, Feature vector, A priori and a posteriori, Fundamental frequency, Singing, Natural language, Multiplicative noise, Spoken language
الوصف: The performance of a spoken language system suffers when non-speech is incorrectly classified as speech. Singing is particularly difficult to discriminate from speech, since both are natural language. However, singing conveys a melody, whereas speech does not; in particular, a singer's fundamental frequency should not deviate significantly from an underlying sequence of notes, while a speaker's fundamental frequency is freer to deviate about a mean value. The present work presents a novel approach to discrimination between singing and speech that exploits the distribution of such deviations. The melody in singing is typically not known a priori, so the distribution cannot be measured directly. Instead, an approximation to its Fourier transform is proposed that allows the unknown melody to be treated as multiplicative noise. This feature vector is shown to be highly discriminative between speech and singing segments when coupled with a simple maximum likelihood classifier, outperforming prior work on real-world data.
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_________::edb80de164c347ec81d289c022dfc8a1Test
https://doi.org/10.1109/slt.2014.7078609Test
رقم الانضمام: edsair.doi...........edb80de164c347ec81d289c022dfc8a1
قاعدة البيانات: OpenAIRE