مؤتمر
Information Extraction from Invoices
العنوان: | Information Extraction from Invoices |
---|---|
المؤلفون: | Hamdi, Ahmed, Carel, Elodie, Joseph, Aurélie, Coustaty, Mickael, Doucet, Antoine |
المساهمون: | Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), La Rochelle Université (ULR), Yooz ITESOFT-Yooz Group |
المصدر: | International Conference on Document Analysis and Recognition ICDAR 2021 ; https://hal.science/hal-03418385Test ; International Conference on Document Analysis and Recognition ICDAR 2021, Sep 2021, Lausanne, Switzerland. pp.699-714, ⟨10.1007/978-3-030-86331-9_45⟩ |
بيانات النشر: | HAL CCSD Springer International Publishing |
سنة النشر: | 2021 |
المجموعة: | HAL - Université de La Rochelle |
مصطلحات موضوعية: | invoices, data extraction, features, neural networks, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing |
جغرافية الموضوع: | Lausanne, Switzerland |
الوصف: | International audience ; The present paper is focused on information extraction from key fields of invoices using two different methods based on sequence labeling. Invoices are semi-structured documents in which data can be located based on the context. Common information extraction systems are model-driven, using heuristics and lists of trigger words curated by domain experts. Their performances are generally high on documents they have been trained for but processing new templates often requires new manual annotations, which is tedious and time-consuming to produce. Recent works on deep learning applied to business documents claimed a gain in terms of time and performance. While these systems do not need manual curation, they nevertheless require a large amount of data to achieve good results. In this paper, we present a series of experiments using neural networks approaches to study the trade-off between data requirements and performance in the extraction of information from key fields of invoices (such as dates, document numbers, types, amounts.). The main contribution of this paper is a system that achieves competitive results using a small amount of data compared to the state-of-the-art systems that need to be trained on large datasets, that are costly and impractical to produce in real-world applications. |
نوع الوثيقة: | conference object |
اللغة: | English |
العلاقة: | hal-03418385; https://hal.science/hal-03418385Test; https://hal.science/hal-03418385/documentTest; https://hal.science/hal-03418385/file/ICDAR2021-Information%20Extraction%20from%20Invoices.pdfTest |
DOI: | 10.1007/978-3-030-86331-9_45 |
الإتاحة: | https://doi.org/10.1007/978-3-030-86331-9_45Test https://hal.science/hal-03418385Test https://hal.science/hal-03418385/documentTest https://hal.science/hal-03418385/file/ICDAR2021-Information%20Extraction%20from%20Invoices.pdfTest |
حقوق: | info:eu-repo/semantics/OpenAccess |
رقم الانضمام: | edsbas.984C016C |
قاعدة البيانات: | BASE |
DOI: | 10.1007/978-3-030-86331-9_45 |
---|