Explainable Machine Learning Prediction for Mortality of COVID-19 in the Colombian Population
Abstract The COVID-19 pandemic, which began in late 2019, has become a global public health problem, resulting in large numbers of people infected and dead. One of the greatest challenges in dealing with the disease is to identify those people who are most at risk of becoming infected, seriously ill and dying from the virus, so that they can be isolated in a targeted manner and thus reduce mortality rates. This article proposes the use of machine learning, and specifically of neural networks and random forest to build two complementary models that identify the probability that a person has of dying because of COVID-19. The models are trained with the demographic information and medical history of two population groups: on the one hand, 43,000 people who died from COVID-19 in Colombia during 2020, and on the other hand, a random sample of 43,000 people who became ill with COVID-19 during the same period of time, but later recovered. After training the neural network classification model, evaluation metrics were applied that yielded an 88% accuracy value. However, transparency is a major requirement for the explicability of COVID-19 prognosis. Therefore, a complementary random forest model is trained that allows the identification of the most significant predictors of mortality by COVID-19.