Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

التفاصيل البيبلوغرافية
العنوان: Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
المؤلفون: Guo, Aoqi, Qian, Sichong, Li, Baoxiang, Gao, Dazhi
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. Subsequently, in the frequency domain, the self-attention mechanism is employed to enhance the model's ability to process frequency-specific details. By design, our model circumvents the influence of pre-separation modules, delivering performance in a more comprehensive end-to-end manner. Experimental results reveal that our model not only outperforms contemporary leading neural beamforming algorithms in separation performance but also achieves this with a significant reduction in parameter count.
نوع الوثيقة: Working Paper
الوصول الحر: http://arxiv.org/abs/2308.15990Test
رقم الانضمام: edsarx.2308.15990
قاعدة البيانات: arXiv