scholarly journals Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

Author(s):  
Nahúm Cueto López ◽  
María Teresa García-Ordás ◽  
Facundo Vitelli-Storelli ◽  
Pablo Fernández-Navarro ◽  
Camilo Palazuelos ◽  
...  

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.

2017 ◽  
Vol 1 (1) ◽  
pp. 53-59 ◽  
Author(s):  
Lance T. Pflieger ◽  
Clinton C. Mason ◽  
Julio C. Facelli

Introduction. Family health history (FHx) is an important factor in breast and ovarian cancer risk assessment. As such, multiple risk prediction models rely strongly on FHx data when identifying a patient’s risk. These models were developed using verified information and when translated into a clinical setting assume that a patient’s FHx is accurate and complete. However, FHx information collected in a typical clinical setting is known to be imprecise and it is not well understood how this uncertainty may affect predictions in clinical settings. Methods. Using Monte Carlo simulations and existing measurements of uncertainty of self-reported FHx, we show how uncertainty in FHx information can alter risk classification when used in typical clinical settings. Results. We found that various models ranged from 52% to 64% for correct tier-level classification of pedigrees under a set of contrived uncertain conditions, but that significant misclassification are not negligible. Conclusions. Our work implies that (i) uncertainty quantification needs to be considered when transferring tools from a controlled research environment to a more uncertain environment (i.e, a health clinic) and (ii) better FHx collection methods are needed to reduce uncertainty in breast cancer risk prediction in clinical settings.


2019 ◽  
Vol 121 (1) ◽  
pp. 76-85 ◽  
Author(s):  
Javier Louro ◽  
Margarita Posso ◽  
Michele Hilton Boon ◽  
Marta Román ◽  
Laia Domingo ◽  
...  

Author(s):  
Allison W. Kurian ◽  
Antonis C. Antoniou ◽  
Susan M. Domchek

Recent advances in genomic technology have enabled far more rapid, less expensive sequencing of multiple genes than was possible only a few years ago. Advances in bioinformatics also facilitate the interpretation of large amounts of genomic data. New strategies for cancer genetic risk assessment include multiplex sequencing panels of 5 to more than 100 genes (in which rare mutations are often associated with at least two times the average risk of developing breast cancer) and panels of common single-nucleotide polymorphisms (SNPs), combinations of which are generally associated with more modest cancer risks (more than twofold). Although these new multiple-gene panel tests are used in oncology practice, questions remain about the clinical validity and the clinical utility of their results. To translate this increasingly complex genetic information for clinical use, cancer risk prediction tools are under development that consider the joint effects of all susceptibility genes, together with other established breast cancer risk factors. Risk-adapted screening and prevention protocols are underway, with ongoing refinement as genetic knowledge grows. Priority areas for future research include the clinical validity and clinical utility of emerging genetic tests; the accuracy of developing cancer risk prediction models; and the long-term outcomes of risk-adapted screening and prevention protocols, in terms of patients’ experiences and survival.


Sign in / Sign up

Export Citation Format

Share Document