Glioblastoma gene expression subtypes and correlation with clinical, molecular and immunohistochemical characteristics in a homogenously treated cohort: GLIOCAT project.

2029 Background: Glioblastoma (GBM) gene expression subtypes have been described in last years, data in homogeneously treated patients is lacking. Methods: Clinical, molecular and immunohistochemistry (IHC) analysis from patients with newly diagnosed GBM homogeneously treated with standard radiochemotherapy were studied. Samples were classified based on the expression profiles into three different subtypes (classical, mesenchymal, proneural) using Support Vector Machine (SVM), the K-nearest neighbor (K-NN) and the single sample Gene Set Enrichment Analysis (ssGSEA) classification algorithms provided by GlioVis web application. Results: GLIOCAT Project recruited 432 patients from 6 catalan institutions, all of whom received standard first-line treatment (2004 -2015). Best paraffin tissue samples were selected for RNAseq and reliable data were obtained from 124. 82 cases (66%) were classified into the same subtype by all three classification algorithms. SVM and ssGEA algorithms obtain more similar results (87%). No differences in clinical variables were found between the 3 GBM subtypes. Proneural subtype was enriched with IDH1 mutated and G-CIMP positive tumors. Mesenchymal subtype (SVM) was enriched in unmethylated MGMT tumors (p = 0.008), and classical (SVM) in methylated MGMT tumors (p = 0.008). Long survivors ( > 30 months) were rarely classified as mesenchymal (0-7.5%) and were more frequently classified as Proneural (23.1-26.). Clinical (age, resection, KPS) and molecular ( IDH1, MGMT) known prognostic factors were confirmed in this serie. Overall, no differences in prognosis were observed between 3 subtypes, but a trend to worse survival in mesenchymal was observed in K-NN (9.6 vs 15 ). Mesenchymal subtype presented less expression of Olig2 (p < 0.001) and SOX2 (p = 0.003) by IHC, but more YLK-40 expression (p = 0.023, SVM). On the other hand, classical subtype expressed more Nestin (p = 0.004) compared to the other subtypes (K-NN). Conclusions: In our study we have not found correlation between glioblastoma expression subtype and outcome. This large serie provides reproducible data regarding clinical-molecular-immunohistochemistry features of glioblastoma genetic subtypes.

Download Full-text

Plant disease prediction using classification algorithms

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp257-264 ◽

2021 ◽

Vol 10 (1) ◽

pp. 257

Author(s):

Maria Morgan ◽

Carla Blank ◽

Raed Seetan

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Nearest Neighbor ◽

Disease Classification ◽

Future Research ◽

Support Vector ◽

Classification Algorithms ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Artificial Neural

<p>This paper investigates the capability of six existing classification algorithms (Artificial Neural Network, Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Decision Tree and Random Forest) in classifying and predicting diseases in soybean and mushroom datasets using datasets with numerical or categorical attributes. While many similar studies have been conducted on datasets of images to predict plant diseases, the main objective of this study is to suggest classification methods that can be used for disease classification and prediction in datasets that contain raw measurements instead of images. A fungus and a plant dataset, which had many differences, were chosen so that the findings in this paper could be applied to future research for disease prediction and classification in a variety of datasets which contain raw measurements. A key difference between the two datasets, other than one being a fungus and one being a plant, is that the mushroom dataset is balanced and only contained two classes while the soybean dataset is imbalanced and contained eighteen classes. All six algorithms performed well on the mushroom dataset, while the Artificial Neural Network and k-Nearest Neighbor algorithms performed best on the soybean dataset. The findings of this paper can be applied to future research on disease classification and prediction in a variety of dataset types such as fungi, plants, humans, and animals.</p>

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

Service complaint identification in hotel social media: A two-step classification approach

International Journal of Electrical Engineering Education ◽

10.1177/0020720920928467 ◽

2020 ◽

pp. 002072092092846

Author(s):

Jiahua Jin ◽

Lu Lu

Keyword(s):

Social Media ◽

Nearest Neighbor ◽

Support Vector ◽

Classification Algorithms ◽

Construction Process ◽

K Nearest Neighbor ◽

Classification Approach ◽

Consumer Complaint ◽

Training Samples ◽

Binary Classifiers

Hotel social media provides access to dissatisfied customers and their experiences with services. However, due to massive topics and posts in social media, and the sparse distribution of complaint-related posts and, manually identifying complaints is inefficient and time-consuming. In this study, we propose a supervised learning method including training samples enlargement and classifier construction. We first identified reliable complaint and noncomplaint samples from the unlabeled dataset by using small labeled samples as training samples. Combining the labeled samples and enlarged samples, classification algorithms support vector machine and k-nearest neighbor were then adopted to build binary classifiers during the classifier construction process. Experimental results indicate the proposed method can identify complaints from social media efficiently, especially when the amount of labeled training samples is small. This study provides an efficient approach for hotel companies to distinguish a certain kind of consumer complaint information from large number of unrelated information in hotel social media.

Download Full-text

A New Algorithm for Analysis of MiRNA Expression Profiles—SVM-RFE-FKNN

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2021.65.3.030407 ◽

2021 ◽

Author(s):

Duan Mei ◽

Qiang Liu

Keyword(s):

Mirna Expression ◽

Nearest Neighbor ◽

Expression Profiles ◽

Binary Classification ◽

Characteristic Curve ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Support Vector ◽

K Nearest Neighbor ◽

Mirna Expression Profiles

Based on MicroRNA (miRNA) expression profiles, this article proposes a new algorithm—SVM-RFE-FKNN, which combines the support vector machine-recursive feature elimination (SVM-RFE) algorithm and the fuzzy K -nearest neighbor (FKNN) algorithm, to realize binary classification of tumors. First, the SVM-RFE algorithm was used to select features from the miRNA expression profile dataset to constitute feature subsets and to determine the maximum number of support vectors. Next, this maximum number was regarded as the upper limit of the parameter K in the FKNN algorithm that was then used to classify the samples to be tested. Finally, the leave-one-out cross-validation method was adopted to assess the classification performance of the proposed algorithm. Through experiments, our proposed algorithm was compared with other twelve classification methods, and the result shows that our algorithm had better classification performance. Specifically, with only a few miRNA biomarkers, the proposed algorithm could reach an accuracy of 99.46% and an area under the receiver operating characteristic curve (AUC) of 0.9874.

Download Full-text

An Ensemble-Based Feature Selection and Classification of Gene Expression using Support Vector Machine, K-Nearest Neighbor, Decision Tree

2019 International Conference on Communication and Electronics Systems (ICCES) ◽

10.1109/icces45898.2019.9002041 ◽

2019 ◽

Author(s):

Anu J Nair ◽

Rizwana Rasheed ◽

KM Maheeshma ◽

LS Aiswarya ◽

K R Kavitha

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Diabetes Prediction Using Machine Learning Techniques

Journal of Intelligent Systems with Applications ◽

10.54856/10.54856/jiswa.202112183 ◽

2021 ◽

pp. 150-152

Author(s):

Seyma Kiziltas Koc ◽

Mustafa Yeniad

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

High Performance ◽

Nearest Neighbor ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.

Download Full-text

Performance of Classifiers on Newsgroups using Specific Subset of Terms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4652.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2497-2500

Keyword(s):

Text Classification ◽

Text Categorization ◽

Nearest Neighbor ◽

Vital Role ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Specific Subset ◽

The World ◽

Good Classification

Text Classification plays a vital role in the world of data mining and same is true for the classification algorithms in text categorization. There are many techniques for text classification but this paper mainly focuses on these approaches Support vector machine (SVM), Naïve Bayes (NB), k-nearest neighbor (k-NN). This paper reveals results of the classifiers on mini-newsgroups data which consists of the classifies on mini-newsgroups data which consists a lot of documents and step by step tasks like a listing of files, preprocessing, the creation of terms(a specific subset of terms), using classifiers on specific subset of datasets. Finally, after the results and experiments over the dataset, it is concluded that SVM achieves good classification output corresponding to accuracy, precision, F-measure and recall but execution time is good for the k-NN approach.

Download Full-text

Performance Research on Medical Data Classification using Traditional and Soft Computing Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1185.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 990-995

Keyword(s):

Data Mining ◽

Soft Computing ◽

Nearest Neighbor ◽

Classification Performance ◽

Medical Data ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Classification Techniques ◽

Soft Computing Techniques

The world today has made giant leaps in the field of Medicine. There is tremendous amount of researches being carried out in this field leading to new discoveries that is making a heavy impact on the mankind. Data being generated in this field is increasing enormously. A need has arisen to analyze these data in order to find out the meaningful and relevant hidden patterns. These patterns can be used for clinical diagnosis. Data mining is an efficient approach in discovering these patterns. Among the many data mining techniques that exists, this paper aims at analyzing the medical data using various Classification techniques. The classification techniques used in this study include k-Nearest neighbor (kNN), Decision Tree, Naive Bayes which are hard computing algorithms, whereas the soft computing algorithms used in this study include Support Vector Machine (SVM), Artificial Neural Networks (ANN) and Fuzzy k-Means clustering. We have applied these algorithms to three kinds of datasets that are Breast Cancer Wisconsin, Haberman Data and Contraceptive Method Choice dataset. Our results show that soft computing based classification algorithms better classifications than the traditional classification algorithms in terms of various classification performance measures

Download Full-text

Classifier Performance Evaluation in Wrist and Finger Movement Fitting Task Based on Forearm HD-sEMG

10.21203/rs.3.rs-1088094/v1 ◽

2021 ◽

Author(s):

Haiqiang Duan ◽

Chenyun Dai ◽

Wei Chen

Keyword(s):

Classification Accuracy ◽

Nearest Neighbor ◽

Hand Movement ◽

The Other ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Superposition ◽

Semg Signal ◽

Linear Discriminant ◽

Fitting In

Abstract Background: The transmission of human body movements to other devices through wearable smart bracelets have attracted more and more attentions in the field of human-machine interface (HMI) applications. However, due to the limitation of the collection range of wearable bracelets, it is necessary to study the relationship between the superposition of wrist and finger motion and their cooperative motion to simplify the collection system of the device.Methods: The multi-channel high-density surface electromyogram (HD-sEMG) signal has high spatial resolution and can improve the accuracy of multi-channel fitting. In this study, we quantified the HD-sEMG forearm spatial activation features of 256 channels of hand movement, and performed a linear fitting of the quantified features of fingers and wrist movements to verify the linear superposition relationship between fingers and wrist cooperative movements and their independent movements. The most important thing is to classify and predict the results of the fitting and the actual measured fingers and wrist cooperative actions by four commonly used classifiers: Linear Discriminant Analysis (LDA) ,K-Nearest Neighbor (KNN) ,Support Vector Machine (SVM) and Random Forest (RF), and evaluate the performance of the four classifiers in gesture fitting in detail according to the classification results.Results: In a total of 12 kinds of synthetic gesture actions, in the three cases where the number of fitting channels was selected as 8, 32 and 64, four classifiers of LDA, SVM, RF and KNN are used for classification prediction. When the number of fitting channels was 8, the prediction accuracy of LDA classifier was 99.70%, the classification accuracy of KNN was 99.40%, the classification accuracy of SVM was 99.20%, and the classification accuracy of RF was 93.75%. When the number of fitting channels was 32, the accuracy of LDA was 98.51%, the classification accuracy of KNN was 97.92%, the accuracy of SVM is 96.73%, and the accuracy of RF was 86.61%. When the number of fitting channels is 64, the accuracy of LDA is 95.83%, the classification accuracy of KNN is 91.67%, the accuracy of SVM is 86.90%, and the accuracy of RF is 83.30%.Conclusion: It can be seen from the results that when the number of fitting channels is 8, the classification accuracy of the three classifiers of LDA, KNN and SVM is basically the same, but the time-consuming of SVM is very small. When the amount of data is large, the priority should be selected SVM as the classifier. When the number of fitting channels increases, the classification accuracy of the LDA classifier will be higher than the other three classifiers, so the LDA classifier should be more appropriate. The classification accuracy of the RF classifier in this type of problem has always been far lower than the other three classifiers, so it is not recommended to use the RF classifier as a classifier for gesture stacking related work.

Download Full-text