Classification performance bias between training and test sets in a limited mammography dataset.

التفاصيل البيبلوغرافية
العنوان:	Classification performance bias between training and test sets in a limited mammography dataset.
المؤلفون:	Rui Hou, Joseph Y Lo, Jeffrey R Marks, E Shelley Hwang, Lars J Grimm
المصدر:	PLoS ONE, Vol 19, Iss 2, p e0282402 (2024)
بيانات النشر:	Public Library of Science (PLoS), 2024.
سنة النشر:	2024
المجموعة:	LCC:Medicine LCC:Science
مصطلحات موضوعية:	Medicine, Science
الوصف:	ObjectivesTo assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study.MethodsMammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n = 400) and test cases (n = 300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features.ResultsArea under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58-0.70, test 0.59-0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance.ConclusionsIn medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings.Advances in knowledgePerformance bias can result from model testing when using limited datasets. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	1932-6203
العلاقة:	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0282402&type=printableTest; https://doaj.org/toc/1932-6203Test
DOI:	10.1371/journal.pone.0282402&type=printable
DOI:	10.1371/journal.pone.0282402
الوصول الحر:	https://doaj.org/article/775eca960f4245d494e86a8fa3966f75Test
رقم الانضمام:	edsdoj.775eca960f4245d494e86a8fa3966f75
قاعدة البيانات:	Directory of Open Access Journals

View record in DOAJ

Full Text Finder

الوصف
تدمد:	19326203
DOI:	10.1371/journal.pone.0282402&type=printable