دورية أكاديمية

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

التفاصيل البيبلوغرافية
العنوان: Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
المؤلفون: Mohamed Elhadi
المصدر: Journal of Advances in Computer Engineering and Technology, Vol 5, Iss 2, Pp 117-128 (2019)
بيانات النشر: Science and Research Branch,Islamic Azad University, 2019.
سنة النشر: 2019
المجموعة: LCC:Technology (General)
LCC:Science
مصطلحات موضوعية: arabic text classification, tfidf-vector space model, news articles, corpora creation, Technology (General), T1-995, Science
الوصف: Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for the creation of Arabic text corpora. In particular, we create a text classification process for Arabic news articles downloaded from web news portals and sites. The suggested procedure is a pilot project that uses some human predefined set of documents that have been assigned to some subjects or categories. A vectorized Term Frequency, Inverse Document Frequency (TF-IDF) based information processing was used for the initial verification of the categories. The resulting validated categories used to predict categories for new documents. The experiment used 1000 initial documents pre-assigned into five categories of each with 200 documents assigned. An initial set of 2195 documents were downloaded from a number of Arabic news sources. They were pre-processed for use in testing the utility of the suggested classification procedure using the cosine similarity as a classifier. Results were very encouraging with very satisfying precision, recall and F1-score. It is the intention of the authors to improve the procedure and to use it for Arabic corpora creation.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2423-4192
2423-4206
العلاقة: http://jacet.srbiau.ac.ir/article_14021_36790a4dd3aca3627c93112fe6fca730.pdfTest; https://doaj.org/toc/2423-4192Test; https://doaj.org/toc/2423-4206Test
الوصول الحر: https://doaj.org/article/e143f5d7a7e24367998872be6f8f9befTest
رقم الانضمام: edsdoj.143f5d7a7e24367998872be6f8f9bef
قاعدة البيانات: Directory of Open Access Journals