دورية أكاديمية

A study on real-time low-quality content detection on Twitter from the users’ perspective.

التفاصيل البيبلوغرافية
العنوان: A study on real-time low-quality content detection on Twitter from the users’ perspective.
المؤلفون: Chen, Weiling, Yeo, Chai Kiat, Lau, Chiew Tong, Lee, Bu Sung
المصدر: PLoS ONE; 8/9/2017, Vol. 12 Issue 8, p1-22, 22p
مصطلحات موضوعية: SOCIAL interaction, ONLINE social networks, EXPECTATION-maximization algorithms, WEB browsing
Reviews & Products: TWITTER (Web resource)
مستخلص: Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content. [ABSTRACT FROM AUTHOR]
Copyright of PLoS ONE is the property of Public Library of Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:19326203
DOI:10.1371/journal.pone.0182487