دورية أكاديمية

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter.

التفاصيل البيبلوغرافية
العنوان: Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter.
المؤلفون: Boligarla, Srikanth, Laison, Elda Kokoè Elolo, Li, Jiaxin, Mahadevan, Raja, Ng, Austen, Lin, Yangming, Thioub, Mamadou Yamar, Huang, Bruce, Ibrahim, Mohamed Hamza, Nasri, Bouchra
المصدر: BMC Medical Informatics & Decision Making; 10/16/2023, Vol. 23 Issue 1, p1-12, 12p
مصطلحات موضوعية: DISEASE incidence, MACHINE learning, VECTOR-borne diseases, REPORTING of diseases, LYME disease, COMMUNICABLE diseases, SOCIAL media
مصطلحات جغرافية: UNITED States
الشركة/الكيان: X Corp.
مستخلص: Background: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than 90 % of all vector-borne diseases in North America. Objective: In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. Methods: The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. Results: Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of 90 % . There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. Conclusions: We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets. [ABSTRACT FROM AUTHOR]
Copyright of BMC Medical Informatics & Decision Making is the property of BioMed Central and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:14726947
DOI:10.1186/s12911-023-02315-z