دورية أكاديمية

Materials information extraction via automatically generated corpus

التفاصيل البيبلوغرافية
العنوان: Materials information extraction via automatically generated corpus
المؤلفون: Rongen Yan, Xue Jiang, Weiren Wang, Depeng Dang, Yanjing Su
المصدر: Scientific Data, Vol 9, Iss 1, Pp 1-12 (2022)
بيانات النشر: Nature Portfolio, 2022.
سنة النشر: 2022
المجموعة: LCC:Science
مصطلحات موضوعية: Science
الوصف: Abstract Information Extraction (IE) in Natural Language Processing (NLP) aims to extract structured information from unstructured text to assist a computer in understanding natural language. Machine learning-based IE methods bring more intelligence and possibilities but require an extensive and accurate labeled corpus. In the materials science domain, giving reliable labels is a laborious task that requires the efforts of many professionals. To reduce manual intervention and automatically generate materials corpus during IE, in this work, we propose a semi-supervised IE framework for materials via automatically generated corpus. Taking the superalloy data extraction in our previous work as an example, the proposed framework using Snorkel automatically labels the corpus containing property values. Then Ordered Neurons-Long Short-Term Memory (ON-LSTM) network is adopted to train an information extraction model on the generated corpus. The experimental results show that the F1-score of γ’ solvus temperature, density and solidus temperature of superalloys are 83.90%, 94.02%, 89.27%, respectively. Furthermore, we conduct similar experiments on other materials, the experimental results show that the proposed framework is universal in the field of materials.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2052-4463
العلاقة: https://doaj.org/toc/2052-4463Test
DOI: 10.1038/s41597-022-01492-2
الوصول الحر: https://doaj.org/article/685ec4444d89427a8237254e22eaa7aeTest
رقم الانضمام: edsdoj.685ec4444d89427a8237254e22eaa7ae
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20524463
DOI:10.1038/s41597-022-01492-2