CancerEMC: frontline non-invasive cancer screening from circulating protein biomarkers and mutations in cell-free DNA
Abstract Motivation The early detection of cancer through accessible blood tests can foster early patient interventions. Although there are developments in cancer detection from cell-free DNA (cfDNA), its accuracy remains speculative. Given its central importance with broad impacts, we aspire to address the challenge. Methods A bagging Ensemble Meta Classifier (CancerEMC) is proposed for early cancer detection based on circulating protein biomarkers and mutations in cfDNA from the blood. CancerEMC is generally designed for both binary cancer detection and multi-class cancer type localization. It can address the class imbalance problem in multi-analyte blood test data based on robust oversampling and adaptive synthesis techniques. Results Based on the clinical blood test data, we observe that the proposed CancerEMC has outperformed other algorithms and state-of-the-arts studies (including CancerSEEK published in Science, 2018) for cancer detection. The results reveal that our proposed method (i.e., CancerEMC) can achieve the best performance result for both binary cancer classification with 99.1748% accuracy (AUC = 0.999) and localized multiple cancer detection with 74.1214% accuracy (AUC = 0.938). For addressing the data imbalance issue with oversampling techniques, the accuracy can be increased to 91.4966% (AUC = 0.992), where the state-of-the-art method can only be estimated at 69.64% (AUC = 0.921). Similar results can also be observed on independent and isolated testing data. Availability https://github.com/saifurcubd/Cancer-Detection