The Use of Multiple Correspondence Analysis and K-means to Explore Associations between Risk Factors and Patient Characteristics in Colorectal Cancer (Preprint)
BACKGROUND Previous works have shown that risk factors are associated with an increased risk of colorectal cancer. OBJECTIVE The purpose of this study was to detect these associations in the region of Lleida (Catalonia) using Multiple Correspondence Analysis (MCA) and K-means. METHODS The cross-sectional study was made up of 1,085 colorectal cancer episodes between 2012 and 2015, extracted from the Population-based Cancer Registry (PCR) for the province of Lleida (Spain), the Primary Care Centers database and the Catalan Health Service Register. The relations between risk factors and patient characteristics were identified using MCA and K-means. RESULTS The combination of these techniques helps to detect clusters of patients with similars risk factors. Risk of death is associated with elderly people and obesity or overweight. Stage III is related with people aged ≥65 and rural/semi-urban population while younger people were related with stage 0. CONCLUSIONS MCA and K-means were a significant help for detecting associations between risk factors and patient characteristics. These techniques have proven to be effective tools for analyzing the incidence of some factors in colorectal cancer. The outcomes obtained help to corroborate suspected trends, as well as stimulating new hypotheses about the previous clinical history and how to prevent it.