GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

التفاصيل البيبلوغرافية
العنوان: GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
المؤلفون: Gehrmann, Sebastian, Bhattacharjee, Abhik, Mahendiran, Abinaya, Wang, Alex, Papangelis, Alexandros, Madaan, Aman, McMillan-Major, Angelina, Shvets, Anna, Upadhyay, Ashish, Bohnet, Bernd, Yao, Bingsheng, Wilie, Bryan, Bhagavatula, Chandra, You, Chaobin, Thomson, Craig, Garbacea, Cristina, Wang, Dakuo, Deutsch, Daniel, Xiong, Deyi, Jin, Di, Gkatzia, Dimitra, Radev, Dragomir, Clark, Elizabeth, Durmus, Esin, Ladhak, Faisal, Ginter, Filip, Winata, Genta Indra, Strobelt, Hendrik, Hayashi, Hiroaki, Novikova, Jekaterina, Kanerva, Jenna, Chim, Jenny, Zhou, Jiawei, Clive, Jordan, Maynez, Joshua, Sedoc, João, Juraska, Juraj, Dhole, Kaustubh, Chandu, Khyathi Raghavi, Perez-Beltrachini, Laura, Ribeiro, Leonardo F.R., Tunstall, Lewis, Zhang, Li, Pushkarna, Mahima, Creutz, Mathias, White, Michael, Kale, Mihir Sanjay, Eddine, Moussa Kamal, Daheim, Nico, Subramani, Nishant, Dusek, Ondrej, Liang, Paul Pu, Ammanamanchi, Pawan Sasanka, Zhu, Qi, Puduppully, Ratish, Kriz, Reno, Shahriyar, Rifat, Cardenas, Ronald, Mahamood, Saad, Osei, Salomey, Cahyawijaya, Samuel, Štajner, Sanja, Montella, Sebastien, Jolly, Shailza, Mille, Simon, Hasan, Tahmid, Shen, Tianhao, Adewumi, Tosin, Raunak, Vikas, Raheja, Vipul, Nikolaev, Vitaly, Tsai, Vivian, Jernite, Yacine, Xu, Ying, Sang, Yisi, Liu, Yixin, Hou, Yufang
المصدر: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. :266-281
مصطلحات موضوعية: Maskininlärning, Machine Learning
الوصف: Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances.We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.
وصف الملف: print
الوصول الحر: https://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-96959Test
https://doi.org/10.18653/v1/2022.emnlp-demos.27Test
قاعدة البيانات: SwePub