A polyphone BERT for Polyphone Disambiguation in Mandarin Chinese

التفاصيل البيبلوغرافية
العنوان: A polyphone BERT for Polyphone Disambiguation in Mandarin Chinese
المؤلفون: Zhang, Song, Zheng, Ken, Zhu, Xiaoxu, Li, Baoxiang
المصدر: Interspeech 2022.
بيانات النشر: ISCA, 2022.
سنة النشر: 2022
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)
الوصف: Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters. Firstly, we create 741 new Chinese monophonic characters from 354 source Chinese polyphonic characters by pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained Chinese BERT with 741 new Chinese monophonic characters and adding a corresponding embedding layer for new tokens, which is initialized by the embeddings of source Chinese polyphonic characters. In this way, we can turn the polyphone disambiguation task into a pre-training task of the Chinese polyphone BERT. Experimental results demonstrate the effectiveness of the proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%) improvement of average accuracy compared with the BERT-based classifier model, which is the prior state-of-the-art in polyphone disambiguation.
Comment: Accepted for INTERSPEECH 2022
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::59a05062ffdacfc3f549e8a792213670Test
https://doi.org/10.21437/interspeech.2022-229Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....59a05062ffdacfc3f549e8a792213670
قاعدة البيانات: OpenAIRE