Márcio Luís Moreira De Souza
◽
Gabriel Ayres Lopes
◽
Alexandre Castelo Branco
◽
Jessica K Fairley
◽
Lucia Alves De Oliveira Fraga
BACKGROUND
According to WHO, to achieve targets for control of leprosy by 2030, it will require disease elimination and interruption of transmission at the national or regional level. India and Brazil have reported the highest leprosy burden over the decades, revealing the need for strategies and tools to help health professionals correctly manage and control the disease.
OBJECTIVE
The objective of this study is to assess the quality of leprosy data in Brazil by SINAN (Information System for Notifiable Diseases) and build a web application to increase the accessibility of an accurate method of classifying leprosy treatment for health professionals, especially for communities further away from the country's major diagnostic centers.
METHODS
Leprosy data were extracted from the SINAN database, carefully cleaned, and used to build artificial intelligence (AI) decision model based on Random Forest (RF) algorithm to predict operational classification in Paucibacillary (PB) or Multibacillary (MB). It used the software: i) Python to extract and clean the data; ii) R to train and test the AI model via cross-validation. To allow broad access, we deployed the final RF classification model in a web application that integrates the cloud service, Microsoft Azure, with a friendly layout built in Bubble.io. It used data available on the IBGE (Brazilian Institute of Geography and Statistics) and the DATASUS (Department of Informatics of the Unified Health System).
RESULTS
We mapped the dispersion of leprosy incidence in Brazil, 2014 to 2018, and noticed a high number of cases in central Brazil in 2014 that became even higher in 2018, in the state of Mato Grosso.
Some municipalities showed discrepancies in the 80% range. We considered inconsistency the fact of not matching a set of standards for leprosy classification, according to WHO. Of a total of 21,047 discrepancies detected, the main was considered the operational rating that does not match the clinical form. After data processing, we identified a total of 77,628 cases with missing data.
Regarding the quality of the AI model applied, the sensitivity was 93.97%, and the specificity was 87.09%. In most states of Brazil, human and machine confidence intervals intersect.
CONCLUSIONS
The proposed APP was able to recognize patterns in leprosy cases registered in the SINAN database and classify new patients as PB or MB, reducing the probability of oversight by health professionals. The collection and notification of data on leprosy in Brazil seem to lack specific validation to increase the quality of the data for implementations via AI. The AI model implemented in this work presented relatively large confidence intervals of accuracy that varied from across Brazilian states. This distortion is possibly due to the quality of the data that fed the classification model.