دورية أكاديمية

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.

التفاصيل البيبلوغرافية
العنوان: Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.
المؤلفون: Le, Hoang-Quynh, Tran, Mai-Vu, Dang, Thanh Hai, Ha, Quang-Thuy, Collier, Nigel
بيانات النشر: //dx.doi.org/10.1093/database/baw102
the Journal of Biological Databases and Curation
Oxford University Press
سنة النشر: 2016
المجموعة: Apollo - University of Cambridge Repository
مصطلحات موضوعية: Animals, Chemically-Induced Disorders, Data Mining, Humans, Models, Theoretical, Support Vector Machine
الوصف: The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a 'silver' CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%).Database URL: SilverCID-The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530Test (doi:10.5281/zenodo.34530). ; H-Q.L. and T.H.D. gratefully acknowledge funding support from Vietnam National University, Hanoi (VNU), under Project No. QG.15.21. N.C. gratefully acknowledges funding support from the UK EPSRC (grant number EP/M005089/1). Funding for open access charge : VNUH Project No. QG.15.21.
نوع الوثيقة: article in journal/newspaper
وصف الملف: application/pdf
اللغة: English
العلاقة: https://www.repository.cam.ac.uk/handle/1810/292319Test
DOI: 10.17863/CAM.39470
الإتاحة: https://doi.org/10.17863/CAM.39470Test
https://www.repository.cam.ac.uk/handle/1810/292319Test
حقوق: Attribution 4.0 International ; https://creativecommons.org/licenses/by/4.0Test/
رقم الانضمام: edsbas.D5FDF78A
قاعدة البيانات: BASE