Information Extraction from Invoices

التفاصيل البيبلوغرافية
العنوان: Information Extraction from Invoices
المؤلفون: Hamdi, Ahmed, Carel, Elodie, Joseph, Aurélie, Coustaty, Mickael, Doucet, Antoine
المساهمون: Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), La Rochelle Université (ULR), Yooz ITESOFT-Yooz Group
المصدر: International Conference on Document Analysis and Recognition ICDAR 2021 ; https://hal.science/hal-03418385Test ; International Conference on Document Analysis and Recognition ICDAR 2021, Sep 2021, Lausanne, Switzerland. pp.699-714, ⟨10.1007/978-3-030-86331-9_45⟩
بيانات النشر: HAL CCSD
Springer International Publishing
سنة النشر: 2021
المجموعة: HAL - Université de La Rochelle
مصطلحات موضوعية: invoices, data extraction, features, neural networks, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
جغرافية الموضوع: Lausanne, Switzerland
الوصف: International audience ; The present paper is focused on information extraction from key fields of invoices using two different methods based on sequence labeling. Invoices are semi-structured documents in which data can be located based on the context. Common information extraction systems are model-driven, using heuristics and lists of trigger words curated by domain experts. Their performances are generally high on documents they have been trained for but processing new templates often requires new manual annotations, which is tedious and time-consuming to produce. Recent works on deep learning applied to business documents claimed a gain in terms of time and performance. While these systems do not need manual curation, they nevertheless require a large amount of data to achieve good results. In this paper, we present a series of experiments using neural networks approaches to study the trade-off between data requirements and performance in the extraction of information from key fields of invoices (such as dates, document numbers, types, amounts.). The main contribution of this paper is a system that achieves competitive results using a small amount of data compared to the state-of-the-art systems that need to be trained on large datasets, that are costly and impractical to produce in real-world applications.
نوع الوثيقة: conference object
اللغة: English
العلاقة: hal-03418385; https://hal.science/hal-03418385Test; https://hal.science/hal-03418385/documentTest; https://hal.science/hal-03418385/file/ICDAR2021-Information%20Extraction%20from%20Invoices.pdfTest
DOI: 10.1007/978-3-030-86331-9_45
الإتاحة: https://doi.org/10.1007/978-3-030-86331-9_45Test
https://hal.science/hal-03418385Test
https://hal.science/hal-03418385/documentTest
https://hal.science/hal-03418385/file/ICDAR2021-Information%20Extraction%20from%20Invoices.pdfTest
حقوق: info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.984C016C
قاعدة البيانات: BASE