Benchmarking machine learning models to assist in the prognosis of tuberculosis

التفاصيل البيبلوغرافية
العنوان: Benchmarking machine learning models to assist in the prognosis of tuberculosis
المؤلفون: Lubnnia Morais Florêncio Souza, Vanderson de Souza Sampaio, Patricia Takako Endo, Geovanne Oliveira Alves, Maicon Herverton Lino Ferreira da Silva Barros, Elisson Rocha, João Fausto Lorenzato de Oliveira, Theo Lynn
المصدر: da Silva Barros, Maicon Herverton Lino Ferreira ORCID: 0000-0002-0275-3298 <https://orcid.org/0000-0002-0275-3298Test>, Alves, Geovanne Oliveira ORCID: 0000-0002-2084-5516 <https://orcid.org/0000-0002-2084-5516Test>, Souza, Lubnnia Morais Florêncio ORCID: 0000-0002-2188-6272 <https://orcid.org/0000-0002-2188-6272Test>, da Silva Rocha, Elisson ORCID: 0000-0002-7742-2995 <https://orcid.org/0000-0002-7742-2995Test>, Oliveira, João Fausto Lorenzato de ORCID: 0000-0002-1150-4904 <https://orcid.org/0000-0002-1150-4904Test>, Lynn, Theo ORCID: 0000-0001-9284-7580 <https://orcid.org/0000-0001-9284-7580Test>, Sampaio, Vanderson ORCID: 0000-0001-7307-8851 <https://orcid.org/0000-0001-7307-8851Test> and Takako Endo, Patricia ORCID: 0000-0002-9163-5583 <https://orcid.org/0000-0002-9163-5583Test> (2021) Benchmarking machine learning models to assist in the prognosis of tuberculosis. Informatics, 8 (2). ISSN 2227-9709
Informatics, Vol 8, Iss 27, p 27 (2021)
Informatics
Volume 8
Issue 2
بيانات النشر: MDPI, 2021.
سنة النشر: 2021
مصطلحات موضوعية: imbalanced data sets, Tuberculosis, Exacerbation, Computer Networks and Communications, Disease, Machine learning, computer.software_genre, 03 medical and health sciences, feature selection, benchmark, 0302 clinical medicine, random search, Health care, medicine, ensemble model, 030212 general & internal medicine, 030304 developmental biology, 0303 health sciences, lcsh:T58.5-58.64, business.industry, lcsh:Information technology, Communication, Benchmarking, medicine.disease, Random forest, Human-Computer Interaction, tuberculosis, neglected tropical disease, prognosis, machine learning, Infectious disease (medical specialty), Gradient boosting, Artificial intelligence, business, computer
الوصف: Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare Citation: Lino Ferreira da Silva Barros, M.H.; Oliveira Alves, G.; Morais Florêncio Souza, L.; da Silva Rocha, E.; Lorenzazto de Oliveira, J.F.; Lynn, T.; Sampaio, V.; Endo, P.T. Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis. Informatics 2021, 8, 27. https://doi.org/10.3390Test/ informatics8020027 Academic Editors: Renato Umeton and Gregory Antell Received: 8 March 2021 Accepted: 9 April 2021 Published: 15 April 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.
وصف الملف: application/pdf
اللغة: English
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa86013aa04ccc488d9c71f5901ff943Test
http://doras.dcu.ie/27522Test/
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....fa86013aa04ccc488d9c71f5901ff943
قاعدة البيانات: OpenAIRE