Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset

التفاصيل البيبلوغرافية
العنوان:	Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset
المؤلفون:	Monique Tonani Novaes, Osmar Luiz Ferreira de Carvalho, Pedro Henrique Guimarães Ferreira, Taciana Leonel Nunes Tiraboschi, Caroline Santos Silva, Jean Carlos Zambrano, Cristiano Mendes Gomes, Eduardo de Paula Miranda, Osmar Abílio de Carvalho Júnior, José de Bessa Júnior
المصدر:	Informatics in Medicine Unlocked, Vol 23, Iss , Pp 100538- (2021)
بيانات النشر:	Elsevier, 2021.
سنة النشر:	2021
المجموعة:	LCC:Computer applications to medicine. Medical informatics
مصطلحات موضوعية:	Machine learning, Imbalanced data, Testosterone deficiency, Ensemble classifier, Computer applications to medicine. Medical informatics, R858-859.7
الوصف:	Testosterone is the most important male sex hormone, and its deficiency brings many physical and mental harms. Efficiently identifying individuals with low testosterone is crucial prior to starting proper treatment. However, routine monitoring of testosterone levels can be costly in many regions, resulting in an underreporting of cases, especially in developing countries. Moreover, there are few studies that employ machine learning (ML) in prognosticating testosterone deficiency. This research, therefore, aims to offer a coherent comparative analysis of machine learning methods that can predict testosterone deficiency without having patients undergo costly medical tests. In doing so, we seek to provide to the urological community a publicly available dataset (https://github.com/osmarluiz/Testosterone-Deficiency-DatasetTest) to increase research in this yet untapped field. For this analysis, we used ten base classifiers (optimized with grid search stratified K-fold cross-validation); three ensemble methods; and eight sampling strategies to analyze a total of 3397 patients. The analysis was based on six features (age; abdominal circumference; triglycerides; high-density lipoprotein; diabetes; and hypertension), all of which were obtained by low-cost exams. We compared the sampling strategies and the classifiers' performance on an independent test set using ranking (PR-AUC), probabilistic (Brier score), and threshold metrics. We found that: (1) within the ranking metrics, sampling strategies did not enhance results in this slightly imbalanced (4:1 ratio) dataset; (2) the ensemble classifier using weighted average presented the best performance; (3) the best base classifier was XGBoost; (4) calibration showed significant improvement for the sampling strategies and slight improvements for the no sampling strategy; (5) the McNemar's test presented statistically similar results among all classifiers; and (6) abdominal circumference (AC) had by far the highest feature importance, followed by triglycerides (TG). Age showed very little significance in predicting testosterone deficiency.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	2352-9148
العلاقة:	http://www.sciencedirect.com/science/article/pii/S2352914821000289Test; https://doaj.org/toc/2352-9148Test
DOI:	10.1016/j.imu.2021.100538
الوصول الحر:	https://doaj.org/article/7ca6f7985c494a439acfafd410365d4aTest
رقم الانضمام:	edsdoj.7ca6f7985c494a439acfafd410365d4a
قاعدة البيانات:	Directory of Open Access Journals

View record in DOAJ

Full Text Finder

الوصف
تدمد:	23529148
DOI:	10.1016/j.imu.2021.100538