نتائج البحث - "Dewey Decimal Classification"

تحديد النتيجة رقم 1
1

دورية أكاديمية

Automatic Classification of Swedish Metadata Using Dewey Decimal Classification : A Comparison of Approaches

المؤلفون: Golub, Koraljka, Hagelbäck, Johan, Ardö, Anders

المصدر: Journal of Data and Information Science; 5(1), pp 18-38 (2020) ; ISSN: 2096-157X

مصطلحات موضوعية: Language Technology (Computational Linguistics), 1D convolutional neural network, Automatic classification, Dewey Decimal Classification, LIBRIS, Machine learning, Multinomial Naïve Bayes, Recurrent neural network, Simple linear network, Standard neural network, String matching, Support Vector Machine, Word embeddings

الوصف: With more and more digital collections of various information resources becoming available, also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems. While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification (DDC) classes for Swedish digital collections, the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC. State-of-the-art machine learning algorithms require at least 1,000 training examples per class. The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data (totaling 802 classes in the training and testing sample, out of 14,413 classes at all levels). Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average; the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task. Word embeddings combined with different types of neural networks (simple linear network, standard neural network, 1D convolutional neural network, and recurrent neural network) produced worse results than Support Vector Machine, but reach close results, with the benefit of a smaller representation size. Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input. Stemming only marginally improves the results. Removed stop-words reduced accuracy in most cases, while removing less frequent words increased it marginally. The greatest impact is produced by the number of training examples: 81.90% accuracy on the training set is achieved when at least 1,000 records per class are available in the training set, and 66.13% when too few records (often less ...

العلاقة: https://lup.lub.lu.se/record/25ca0527-90a9-4c7d-9cea-84704f019eaaTest; http://dx.doi.org/10.2478/jdis-2020-0003Test; scopus:85085118766

الإتاحة: https://doi.org/10.2478/jdis-2020-0003Test
https://lup.lub.lu.se/record/25ca0527-90a9-4c7d-9cea-84704f019eaaTest

View record in BASE

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 2
2

كتاب

Automatic classification using DDC on the Swedish union catalogue

المؤلفون: Golub, Koraljka, Hagelbäck, Johan, Ardö, Anders

المصدر: CEUR Workshop Proceedings; 2200, pp 4-16 (2018) ; ISSN: 1613-0073

مصطلحات موضوعية: Information Studies, Automatic classification, Dewey Decimal Classification, LIBRIS, Machine learning, Multinomial Naïve Bayes, Subject access, Support Vector Machine

الوصف: With more and more digital collections of various information resources becoming available, also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems. While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification (DDC) classes for Swedish digital collections, the paper aims to evaluate the performance of two machine learning algorithms for Swedish catalogue records from the Swedish union catalogue (LIBRIS). The algorithms are tested on the top three hierarchical levels of the DDC. Based on a data set of 143,838 records, evaluation shows that Support Vector Machine with linear kernel outperforms Multinomial Naïve Bayes algorithm. Also, using keywords or combining titles and keywords gives better results than using only titles as input. The class imbalance where many DDC classes only have few records greatly affects classification performance: 81.37% accuracy on the training set is achieved when at least 1,000 records per class are available, and 66.13% when few records on which to train are available. Proposed future research involves an exploration of the intellectual effort put into creating the DDC to further improve the algorithm performance as commonly applied in string matching, and to test the best approach on new digital collections that do not have DDC assigned.

العلاقة: https://lup.lub.lu.se/record/bcc9aee9-3f1d-4b6a-9e3a-df80aabe22a5Test; scopus:85053933816

الإتاحة: https://lup.lub.lu.se/record/bcc9aee9-3f1d-4b6a-9e3a-df80aabe22a5Test

View record in BASE

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 3
3

رسالة جامعية

Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing

المؤلفون: Golub, Koraljka

مصطلحات موضوعية: Electrical Engineering, Electronic Engineering, Information Engineering, Dewey Decimal Classification, Engineering Information, hierarchical browsing, controlled vocabularies, thesauri, classification schemes, Automated classification, subject classification, Artificial intelligens, Artificiell intelligens

الوصف: With the exponential growth of the World Wide Web, automated subject classification has become a major research issue. Organizing web pages into a hierarchical structure for subject browsing has been gaining more recognition as an important tool in information-seeking processes.The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if they are similar enough to the former. In the thesis, a string-matching algorithm based on a controlled vocabulary was explored. It does not require training documents, but instead reuses the intellectual work invested into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against text of documents to be classified. Plain string-matching was enhanced in several ways, including term weighting with cut-offs, exclusion of certain terms, and enrichment of the controlled vocabulary with automatically extracted terms. The final results were comparable to those of state-of-the-art machine-learning algorithms, especially for particular classes. Concerning web pages, it was indicated that all the structural information and metadata available in web pages should be used in order to achieve the best automated classification results; however, the exact way of combining them proved not to be very important.In the context of browsing, the biggest difference between three approaches to automated classification (machine learning, information retrieval, library science) is whether they use controlled vocabularies. It has been claimed that well-structured, high-quality classification schemes, such as those used predominantly in library science approaches, could serve as good browsing structures. In the thesis it was shown that Dewey Decimal Classification and Engineering Information classification scheme are suitable for the task. Moreover, a log analysis of a large web-based service using Dewey Decimal Classification ...

وصف الملف: application/pdf

العلاقة: https://lup.lub.lu.se/record/599083Test; urn:isbn:91-7167-042-4; https://portal.research.lu.se/files/5698449/599084.pdfTest

الإتاحة: https://lup.lub.lu.se/record/599083Test
https://portal.research.lu.se/files/5698449/599084.pdfTest

View record in BASE

عرض رمز QR

أضف إلى السلة حذف من سلة الكتب
أضف إلى المفضلة

محفوظ في:

تنقيح النتائج