دورية أكاديمية

Towards Understanding Neural Machine Translation with Attention Heads’ Importance

التفاصيل البيبلوغرافية
العنوان: Towards Understanding Neural Machine Translation with Attention Heads’ Importance
المؤلفون: Zijie Zhou, Junguo Zhu, Weijiang Li
المصدر: Applied Sciences, Vol 14, Iss 7, p 2798 (2024)
بيانات النشر: MDPI AG, 2024.
سنة النشر: 2024
المجموعة: LCC:Technology
LCC:Engineering (General). Civil engineering (General)
LCC:Biology (General)
LCC:Physics
LCC:Chemistry
مصطلحات موضوعية: neural machine translation, interpretability, linguistics, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
الوصف: Although neural machine translation has made great progress, and the Transformer has advanced the state-of-the-art in various language pairs, the decision-making process of the attention mechanism, a crucial component of the Transformer, remains unclear. In this paper, we propose to understand the model’s decisions by the attention heads’ importance. We explore the knowledge acquired by the attention heads, elucidating the decision-making process through the lens of linguistic understanding. Specifically, we quantify the importance of each attention head by assessing its contribution to neural machine translation performance, employing a Masking Attention Heads approach. We evaluate the method and investigate the distribution of attention heads’ importance, as well as its correlation with part-of-speech contribution. To understand the diverse decisions made by attention heads, we concentrate on analyzing multi-granularity linguistic knowledge. Our findings indicate that specialized heads play a crucial role in learning linguistics. By retaining important attention heads and removing the unimportant ones, we can optimize the attention mechanism. This optimization leads to a reduction in the number of model parameters and an increase in the model’s speed. Moreover, by leveraging the connection between attention heads and multi-granular linguistic knowledge, we can enhance the model’s interpretability. Consequently, our research provides valuable insights for the design of improved NMT models.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2076-3417
العلاقة: https://www.mdpi.com/2076-3417/14/7/2798Test; https://doaj.org/toc/2076-3417Test
DOI: 10.3390/app14072798
الوصول الحر: https://doaj.org/article/c5e94c079a484c5aa8008ea1561555b2Test
رقم الانضمام: edsdoj.5e94c079a484c5aa8008ea1561555b2
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20763417
DOI:10.3390/app14072798