دورية أكاديمية

Data collaboration analysis in predicting diabetes from a small amount of health checkup data.

التفاصيل البيبلوغرافية
العنوان: Data collaboration analysis in predicting diabetes from a small amount of health checkup data.
المؤلفون: Uchitachimoto, Go, Sukegawa, Noriyoshi, Kojima, Masayuki, Kagawa, Rina, Oyama, Takashi, Okada, Yukihiko, Imakura, Akira, Sakurai, Tetsuya
المصدر: Scientific Reports; 7/21/2023, Vol. 13 Issue 1, p1-8, 8p
مصطلحات موضوعية: HEALTH facilities, DATA analysis, DIABETES, URBAN health, DECISION trees
مستخلص: Recent studies showed that machine learning models such as gradient-boosting decision tree (GBDT) can predict diabetes with high accuracy from big data. In this study, we asked whether highly accurate prediction of diabetes is possible even from small data by expanding the amount of data through data collaboration (DC) analysis, a modern framework for integrating and analyzing data accumulated at multiple institutions while ensuring confidentiality. To this end, we focused on data from two institutions: health checkup data of 1502 citizens accumulated in Tsukuba City and health history data of 1399 patients collected at the University of Tsukuba Hospital. When using only the health checkup data, the ROC-AUC and Recall for logistic regression (LR) were 0.858 ± 0.014 and 0.970 ± 0.019, respectively, while those for GBDT were 0.856 ± 0.014 and 0.983 ± 0.016, respectively. When using also the health history data through DC analysis, these values for LR improved to 0.875 ± 0.013 and 0.993 ± 0.009, respectively, while those for GBDT deteriorated because of the low compatibility with a method used for confidential data sharing (although DC analysis brought improvements). Even in a situation where health checkup data of only 324 citizens are available, the ROC-AUC and Recall for LR were 0.767 ± 0.025 and 0.867 ± 0.04, respectively, thanks to DC analysis, indicating an 11% and 12% improvement. Thus, we concluded that the answer to the above question was "Yes" for LR but "No" for GBDT for the data set tested in this study. [ABSTRACT FROM AUTHOR]
Copyright of Scientific Reports is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:20452322
DOI:10.1038/s41598-023-38932-x