Explainable Machine Learning Prediction for COVID-19 Mortality in the Colombian Population
Abstract The COVID-19 pandemic, which began in late 2019, has become a global public health problem, resulting in large numbers of infections and deaths. One of the greatest challenges in dealing with the disease is to identify those people who are most at risk of becoming infected, seriously ill and dying from the virus so that they can be isolated in a targeted manner to reduce mortality rates. This article proposes using machine learning, specifically neural networks, and random forests, to build two complementary models that identify the probability that a person has of dying because of COVID-19. The models are trained with the demographic information and medical history of two population groups: 43,000 people who died from COVID-19 in Colombia during 2020, and a random sample of 43,000 people who became ill with COVID-19 during the same period but later recovered. After training the neural network classification model, evaluation metrics were applied that yielded an 88% accuracy value. However, transparency is a major requirement for the explicability of the COVID-19 prognosis. Therefore, a complementary random forest model was trained that identified the most significant predictors of mortality by COVID-19.