Is the new model better? One metric says yes, but the other says no. Which metric do I use?

التفاصيل البيبلوغرافية
العنوان: Is the new model better? One metric says yes, but the other says no. Which metric do I use?
المؤلفون: Zhou, Qian M., Lu, Zhe, Brooke, Russell J., Hudson, Melissa M, Yuan, Yan
سنة النشر: 2020
المجموعة: Statistics
مصطلحات موضوعية: Statistics - Methodology
الوصف: Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of conflicting conclusions is not uncommon, and it creates a dilemma in medical decision making. In this article, we examine the analytical connections and differences between two IncV metrics: IncV in AUC (IncV-AUC) and IncV in AP (IncV-AP). Additionally, since they are both semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS), via a numerical study. We demonstrate that both IncV-AUC and IncV-AP are weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, IncV-AP assigns heavier weights to the changes in the high-risk group, whereas IncV-AUC weights the changes equally. In the numerical study, we find that IncV-AP has a wide range, from negative to positive, but the size of IncV-AUC is much smaller. In addition, IncV-AP and IncV-sBr Sare highly consistent, but IncV-AUC is negatively correlated with IncV-sBrS and IncV-AP at a low event rate. IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases.
Comment: 25 pages, 6 figures, 1 table. Compared to Version 1, the title and overall structure of the manuscript have been changed significantly
نوع الوثيقة: Working Paper
الوصول الحر: http://arxiv.org/abs/2010.09822Test
رقم الانضمام: edsarx.2010.09822
قاعدة البيانات: arXiv