Constructive Effect of Ranking Optimal Features Using Random Forest, SupportVector Machine and Naïve Bayes forBreast Cancer Diagnosis

Author(s):  
B. G. Deepa ◽  
S. Senthil
2021 ◽  
Vol 2021 (1) ◽  
pp. 1012-1018
Author(s):  
Handy Geraldy ◽  
Lutfi Rahmatuti Maghfiroh

Dalam menjalankan peran sebagai penyedia data, Badan Pusat Statistik (BPS) memberikan layanan akses data BPS bagi masyarakat. Salah satu layanan tersebut adalah fitur pencarian di website BPS. Namun, layanan pencarian yang diberikan belum memenuhi harapan konsumen. Untuk memenuhi harapan konsumen, salah satu upaya yang dapat dilakukan adalah meningkatkan efektivitas pencarian agar lebih relevan dengan maksud pengguna. Oleh karena itu, penelitian ini bertujuan untuk membangun fungsi klasifikasi kueri pada mesin pencari dan menguji apakah fungsi tersebut dapat meningkatkan efektivitas pencarian. Fungsi klasifikasi kueri dibangun menggunakan model machine learning. Kami membandingkan lima algoritma yaitu SVM, Random Forest, Gradient Boosting, KNN, dan Naive Bayes. Dari lima algoritma tersebut, model terbaik diperoleh pada algoritma SVM. Kemudian, fungsi tersebut diimplementasikan pada mesin pencari yang diukur efektivitasnya berdasarkan nilai precision dan recall. Hasilnya, fungsi klasifikasi kueri dapat mempersempit hasil pencarian pada kueri tertentu, sehingga meningkatkan nilai precision. Namun, fungsi klasifikasi kueri tidak memengaruhi nilai recall.


Author(s):  
T R Stella Mary ◽  
Shoney Sebastian

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>


PLoS ONE ◽  
2014 ◽  
Vol 9 (1) ◽  
pp. e86703 ◽  
Author(s):  
Wangchao Lou ◽  
Xiaoqing Wang ◽  
Fan Chen ◽  
Yixiao Chen ◽  
Bo Jiang ◽  
...  

2021 ◽  
Author(s):  
Dongxiao Gu ◽  
Wang Zhao ◽  
Xuejie Yang ◽  
Kaixiang Su ◽  
Changyong Liang ◽  
...  

BACKGROUND Artificial intelligence can help physicians improve the accuracy of breast cancer diagnosis. However, the effectiveness of AI applications is limited by doctors’ adoption of the results recommended by the AI systems. A case-based reasoning system for breast cancer diagnosis (CBR-BCD) that considers the effects of external characteristics of cases (ECC) can not only provide doctors with more accurate results for auxiliary diagnosis, but also improve doctors’ trust in the results, so as to encourage doctors to adopt the results recommended by the system. OBJECTIVE The objective of our study is to develop a novel integrated case-based reasoning (CBR) framework based on Naive Bayes and K-Nearest Neighbor (KNN) algorithms considering the effects of external characteristics of cases (CBR-ECC) and a corresponding system named CBR-BCD to assist in diagnosis and promote adoption by doctors. METHODS We used a real-world data set from the Maputo Central Hospital in Mozambique and constructed the CBR-ECC model and corresponding CBR-BCD system. We performed data processing and obtained six internal features and three external features of the cases. We randomly divided the 1214 cases into a training group and a testing group. The performance of the model was evaluated by accuracy and the area under the receiver operating characteristic curve (AUC). RESULTS The system based on the CBR-ECC model was developed. In the first stage of this model, Naive Bayes showed the best performance, compared with KNN and J48 decision tree classifiers, with an accuracy rate of 95.87%. In the second stage, the accuracy of the KNN model with the optimal K value of 2 was 99.40%. In the third stage, after considering the external characteristics of the cases, the rankings of recommendation changed. Finally, we report the users’ evaluation of the novel CBR system in a real hospital scenario; we found that it is superior to the original system. CONCLUSIONS CBR-BCD not only enables accurate case recommendations to support health practitioners in diagnosing breast cancer and reducing diagnostic inaccuracies, but also facilitates the adoption of system-recommended results by physicians, which is valuable for clinicians to assist in diagnosis. It enables the early screening of breast cancer to improve the quality of breast cancer management and reduces the socioeconomic burden compared to traditional methods.


Author(s):  
Anirudh Reddy Cingireddy ◽  
Robin Ghosh ◽  
Supratik Kar ◽  
Venkata Melapu ◽  
Sravanthi Joginipeli ◽  
...  

Frequent testing of the entire population would help to identify individuals with active COVID-19 and allow us to identify concealed carriers. Molecular tests, antigen tests, and antibody tests are being widely used to confirm COVID-19 in the population. Molecular tests such as the real-time reverse transcription-polymerase chain reaction (rRT-PCR) test will take a minimum of 3 hours to a maximum of 4 days for the results. The authors suggest using machine learning and data mining tools to filter large populations at a preliminary level to overcome this issue. The ML tools could reduce the testing population size by 20 to 30%. In this study, they have used a subset of features from full blood profile which are drawn from patients at Israelita Albert Einstein hospital located in Brazil. They used classification models, namely KNN, logistic regression, XGBooting, naive Bayes, decision tree, random forest, support vector machine, and multilayer perceptron with k-fold cross-validation, to validate the models. Naïve bayes, KNN, and random forest stand out as the most predictive ones with 88% accuracy each.


2021 ◽  
Vol 12 (10) ◽  
pp. 101202 ◽  
Author(s):  
Abdulwaheed Tella ◽  
Abdul-Lateef Balogun ◽  
Naheem Adebisi ◽  
Samsuri Abdullah

Sign in / Sign up

Export Citation Format

Share Document