scholarly journals Radiomic Feature Reduction Approach to Predict Breast Cancer by Contrast-Enhanced Spectral Mammography Images

Diagnostics ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 684
Author(s):  
Raffaella Massafra ◽  
Samantha Bove ◽  
Vito Lorusso ◽  
Albino Biafora ◽  
Maria Colomba Comes ◽  
...  

Contrast-enhanced spectral mammography (CESM) is an advanced instrument for breast care that is still operator dependent. The aim of this paper is the proposal of an automated system able to discriminate benign and malignant breast lesions based on radiomic analysis. We selected a set of 58 regions of interest (ROIs) extracted from 53 patients referred to Istituto Tumori “Giovanni Paolo II” of Bari (Italy) for the breast cancer screening phase between March 2017 and June 2018. We extracted 464 features of different kinds, such as points and corners of interest, textural and statistical features from both the original ROIs and the ones obtained by a Haar decomposition and a gradient image implementation. The features data had a large dimension that can affect the process and accuracy of cancer classification. Therefore, a classification scheme for dimension reduction was needed. Specifically, a principal component analysis (PCA) dimension reduction technique that includes the calculation of variance proportion for eigenvector selection was used. For the classification method, we trained three different classifiers, that is a random forest, a naïve Bayes and a logistic regression, on each sub-set of principal components (PC) selected by a sequential forward algorithm. Moreover, we focused on the starting features that contributed most to the calculation of the related PCs, which returned the best classification models. The method obtained with the aid of the random forest classifier resulted in the best prediction of benign/malignant ROIs with median values for sensitivity and specificity of 88.37% and 100%, respectively, by using only three PCs. The features that had shown the greatest contribution to the definition of the same were almost all extracted from the LE images. Our system could represent a valid support tool for radiologists for interpreting CESM images.

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258326
Author(s):  
Wen Bo Liu ◽  
Sheng Nan Liang ◽  
Xi Wen Qin

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.


Author(s):  
A. B Yusuf ◽  
R. M Dima ◽  
S. K Aina

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.


2020 ◽  
Vol 2 (1) ◽  
pp. 96-101
Author(s):  
Ahmad Fauzi ◽  
Riki Supriyadi ◽  
Nurlaelatul Maulidah

Abstrak  - Skrining merupakan upaya deteksi dini untuk mengidentifikasi penyakit atau kelainan yang secara klinis belum jelas dengan menggunakan tes, pemeriksaan atau prosedur tertentu. Upaya ini dapat digunakan secara cepat untuk membedakan orang - orang yang kelihatannya sehat tetapi sesungguhnya menderita suatu kelainan.Tujuan utama penelitian ini adalah untuk meningkatkan peforma klasifikasi pada diagnosis kanker payudara dengan menerapkan seleksi fitur pada beberapa algoritme klasifikasi. Penelitian ini menggunakan database kanker payudara Breast Cancer Coimbra Data Set . Metode seleksi fitur berbasis pricipal component analysis akan dipasangkan dengan beberapa algoritme klasifikasi dan metode, seperti Logitboost,Bagging,dan Random Forest. Penelitian ini menggunakan 10 fold cross validation sebagai metode evaluasi. Hasil penelitian menunjukkan metode seleksi fitur berbasis pricipal component analysis mengalami peningkatan peforma klasifikasi secara signifikan setelah dipasangkan dengan seleksi fitur Random Forest dan logitboost, Random forest menunjukan peforma terbaik dengan akurasi 79.3103% dengan nilai AUC sebesar 0,843. Kata Kunci: Seleksi Fitur,PCA, Kanker Payudara,Skrining,Random Forest


2019 ◽  
Vol 8 (6) ◽  
pp. 891 ◽  
Author(s):  
Annarita Fanizzi ◽  
Liliana Losurdo ◽  
Teresa Maria A. Basile ◽  
Roberto Bellotti ◽  
Ubaldo Bottigli ◽  
...  

Contrast-Enhanced Spectral Mammography (CESM) is a novelty instrumentation for diagnosing of breast cancer, but it can still be considered operator dependent. In this paper, we proposed a fully automatic system as a diagnostic support tool for the clinicians. For each Region Of Interest (ROI), a features set was extracted from low-energy and recombined images by using different techniques. A Random Forest classifier was trained on a selected subset of significant features by a sequential feature selection algorithm. The proposed Computer-Automated Diagnosis system is tested on 48 ROIs extracted from 53 patients referred to Istituto Tumori “Giovanni Paolo II” of Bari (Italy) from the breast cancer screening phase between March 2017 and June 2018. The present method resulted highly performing in the prediction of benign/malignant ROIs with median values of sensitivity and specificity of 87 . 5 % and 91 . 7 % , respectively. The performance was high compared to the state-of-the-art, even with a moderate/marked level of parenchymal background. Our classification model outperformed the human reader, by increasing the specificity over 8 % . Therefore, our system could represent a valid support tool for radiologists for interpreting CESM images, both reducing the false positive rate and limiting biopsies and surgeries.


2021 ◽  
Vol 8 (4) ◽  
pp. 372-384
Author(s):  
Sarada Ghosh ◽  
◽  
Guruprasad Samanta ◽  
Manuel De la Sen ◽  
◽  
...  

<abstract> <p>DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on <italic>R</italic>-software.</p> </abstract>


2020 ◽  
Vol 4 (5) ◽  
pp. 805-812
Author(s):  
Riska Chairunisa ◽  
Adiwijaya ◽  
Widi Astuti

Cancer is one of the deadliest diseases in the world with a mortality rate of 57,3% in 2018 in Asia. Therefore, early diagnosis is needed to avoid an increase in mortality caused by cancer. As machine learning develops, cancer gene data can be processed using microarrays for early detection of cancer outbreaks. But the problem that microarray has is the number of attributes that are so numerous that it is necessary to do dimensional reduction. To overcome these problems, this study used dimensions reduction Discrete Wavelet Transform (DWT) with Classification and Regression Tree (CART) and Random Forest (RF) as classification method. The purpose of using these two classification methods is to find out which classification method produces the best performance when combined with the DWT dimension reduction. This research use five microarray data, namely Colon Tumors, Breast Cancer, Lung Cancer, Prostate Tumors and Ovarian Cancer from Kent-Ridge Biomedical Dataset. The best accuracy obtained in this study for breast cancer data were 76,92% with CART-DWT, Colon Tumors 90,1% with RF-DWT, lung cancer 100% with RF-DWT, prostate tumors 95,49% with RF-DWT, and ovarian cancer 100% with RF-DWT. From these results it can be concluded that RF-DWT is better than CART-DWT.  


Choonpa Igaku ◽  
2013 ◽  
Vol 40 (2) ◽  
pp. 167-174 ◽  
Author(s):  
Yukio MITSUZUKA ◽  
Shinsaku KANAZAWA ◽  
Hideaki OGATA ◽  
Kenichi MARUYAMA ◽  
Tsuneyoshi YAKUWA ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document