An Investigation of Incorporating Mamba for Speech Enhancement

التفاصيل البيبلوغرافية
العنوان: An Investigation of Incorporating Mamba for Speech Enhancement
المؤلفون: Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, Tsao, Yu
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric-oriented loss functions. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset. When combined with the perceptual contrast stretching technique, the proposed SEMamba yields a new state-of-the-art PESQ score of 3.69.
نوع الوثيقة: Working Paper
الوصول الحر: http://arxiv.org/abs/2405.06573Test
رقم الانضمام: edsarx.2405.06573
قاعدة البيانات: arXiv