A highly accurate model for screening prostate cancer using propensity index panel of ten genes
Prostate-specific antigen (PSA) is a key biomarker, which is commonly used to screen patients of prostate cancer. There is a significant number of unnecessary biopsies that are performed every year, due to poor accuracy of PSA based biomarker. In this study, we identified alternate biomarkers based on gene expression that can be used to screen prostate cancer with high accuracy. All models were trained and test on gene expression profile of 500 prostate cancer and 51 normal samples. Numerous feature selection techniques have been used to identify potential biomarkers. These biomarkers have been used to develop various models using different machine learning techniques for predicting samples of prostate cancer. Our logistic regression-based model achieved highest AUROC 0.91 with accuracy 82.42% on validation dataset. We introduced a new approach called propensity index, where expression of gene is converted into propensity. Our propensity based approach improved the performance of classification models significantly and achieved AUROC 0.99 with accuracy 96.36% on validation dataset. We also identified and ranked selected genes which can be used to discriminate prostate cancer patients from health individuals with high accuracy. It was observed that single gene based biomarkers can only achieve accuracy around 90%. In this study, we got best performance using a panel of 10 genes; random forest model using propensity index.