A study on the relevance of generic word embeddings for sentence classification in hepatic surgery

التفاصيل البيبلوغرافية
العنوان: A study on the relevance of generic word embeddings for sentence classification in hepatic surgery
المؤلفون: Oukelmoun, Achir, Semmar, Nasredine, de Chalendar, Gaël, Habran, Enguerrand, Vibert, Eric, Goblet, Emma, Oukelmoun, Mariame, Allard, Marc-Antoine
المساهمون: Laboratoire Analyse Sémantique Textes et Images (LASTI), Département Intelligence Ambiante et Systèmes Interactifs (DIASI (CEA, LIST)), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Chaire innovation Bloc OPératoire Augmenté (BOPA), Institut Mines-Télécom Business School (IMT-BS), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT), Department of Laboratory Medicine, Medical laboratory of Cheikh Zaid hospital, Abulcasis International University of Health Sciences, Morocco
المصدر: Proceedings of the 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2023) ; AICCSA 2023 - 20th ACS/IEEE International Conference on Computer Systems and Applications ; https://cea.hal.science/cea-04559674Test ; AICCSA 2023 - 20th ACS/IEEE International Conference on Computer Systems and Applications, Dec 2023, Gizeh, Egypt. ⟨10.1109/AICCSA59173.2023.10479342⟩ ; https://www.computer.org/csdl/proceedings/aiccsa/2023/1VOAizMiU2kTest
بيانات النشر: HAL CCSD
سنة النشر: 2023
مصطلحات موضوعية: Natural Language Processing, Word embeddings, Gradient Boosting, hepatic, surgery, transformers, classifiers, supervised learning, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
جغرافية الموضوع: Gizeh, Egypt
الوصف: International audience ; While the fine-tuning process of extensive contextual language models often demands substantial computational capacity, utilizing generic pre-trained models in highly specialized domains can yield suboptimal results. This paper aims to explore an innovative approach to derive pertinent word embeddings tailored to a specific domain with limited computational resources (The introduced methodologies are tested within the domain of hepatic surgery, utilizing the French language.). This exploration takes place within a context where computational limitations prohibit the fine-tuning of large language models. A new embedding (referred to as FTW2V) that combines Word2Vec and FastText is introduced. This approach addresses the challenge of incorporating terms absent from Word2Vec’s vocabulary. Furthermore, a novel method is used to evaluate the significance of word embeddings within a specialized corpus. This evaluation involves comparing classification scores distributions of classifiers (Gradient Boosting) trained on word embeddings derived from benchmarked Natural Language Processing (NLP) models. As per this assessment technique, the FTW2V model, trained from scratch with limited computational resources, outperforms generic contextual models in terms of word embeddings quality. Additionally, a computationally efficient contextual model rooted in FTW2V is introduced. This modified model substitutes Gradient Boosting with a transformer and integrates Part Of Speech labels.
نوع الوثيقة: conference object
اللغة: English
ردمك: 979-83-503-1943-9
العلاقة: cea-04559674; https://cea.hal.science/cea-04559674Test; https://cea.hal.science/cea-04559674/documentTest; https://cea.hal.science/cea-04559674/file/AICCSA_2023_Paper_IEEE_Achir_Oukelmoun_NoteIEEE.pdfTest
DOI: 10.1109/AICCSA59173.2023.10479342
الإتاحة: https://doi.org/10.1109/AICCSA59173.2023.10479342Test
https://cea.hal.science/cea-04559674Test
https://cea.hal.science/cea-04559674/documentTest
https://cea.hal.science/cea-04559674/file/AICCSA_2023_Paper_IEEE_Achir_Oukelmoun_NoteIEEE.pdfTest
حقوق: info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.E17E0E18
قاعدة البيانات: BASE
الوصف
ردمك:9798350319439
DOI:10.1109/AICCSA59173.2023.10479342