THE EFFECT OF POPULATION CONTEXTS ON CLASSIFIER PERFORMANCE

2008 ◽  
Vol 16 (04) ◽  
pp. 495-517 ◽  
Author(s):  
ASHISH CHOUDHARY ◽  
JIANPING HUA ◽  
MICHAEL L. BITTNER ◽  
EDWARD R. DOUGHERTY

Classifying a patient based on disease type, treatment prognosis, survivability, or other such criteria has become a major focus of genomics and proteomics. From the perspective of the general population of a particular kind of cell, one would like a classifier that applies to the whole population; however, it is often the case that the population is sufficiently structurally diverse that a satisfactory classifier cannot be designed from available sample data. In such a circumstance, it can be useful to identify cellular contexts within which a disease can be reliably diagnosed, which in effect means that one would like to find classifiers that apply to different sub-populations within the overall population. Using a model-based approach, this paper quantifies the effect of contexts on classification performance as a function of the classifier used and the sample size. The advantage of a model-based approach is that we can vary the contextual confusion as a function of the model parameters, thereby allowing us to compare the classification performance in terms of the degree of discriminatory confusion caused by the contexts. We consider five popular classifiers: linear discriminant analysis, three nearest neighbor, linear support vector machine, polynomial support vector machine, and Boosting. We contrast the case where classification is done with a single classifier without discriminating between the contexts to the case where there are context markers that facilitate context separation before classifier design. We observe that little can be done if there is high contextual confusion, but when the contextual confusion is low, context separation can be beneficial, the benefit depending on the classifier.

Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2814 ◽  
Author(s):  
Xiaoguang Liu ◽  
Huanliang Li ◽  
Cunguang Lou ◽  
Tie Liang ◽  
Xiuling Liu ◽  
...  

Falls are the major cause of fatal and non-fatal injury among people aged more than 65 years. Due to the grave consequences of the occurrence of falls, it is necessary to conduct thorough research on falls. This paper presents a method for the study of fall detection using surface electromyography (sEMG) based on an improved dual parallel channels convolutional neural network (IDPC-CNN). The proposed IDPC-CNN model is designed to identify falls from daily activities using the spectral features of sEMG. Firstly, the classification accuracy of time domain features and spectrograms are compared using linear discriminant analysis (LDA), k-nearest neighbor (KNN) and support vector machine (SVM). Results show that spectrograms provide a richer way to extract pattern information and better classification performance. Therefore, the spectrogram features of sEMG are selected as the input of IDPC-CNN to distinguish between daily activities and falls. Finally, The IDPC-CNN is compared with SVM and three different structure CNNs under the same conditions. Experimental results show that the proposed IDPC-CNN achieves 92.55% accuracy, 95.71% sensitivity and 91.7% specificity. Overall, The IDPC-CNN is more effective than the comparison in accuracy, efficiency, training and generalization.


2015 ◽  
Vol 740 ◽  
pp. 600-603
Author(s):  
You Jun Yue ◽  
Yan Fei Hu ◽  
Hui Zhao ◽  
Hong Jun Wang

The accurate prediction model’s establishing of the blast furnace coke rate is important for optimizing the integrated production indicators of iron and steel enterprise. For the problem of accuracy of the model of coke rate, This paper established blast coke rate modeling with support vector machine algorithm, the model parameters of support vector machine was optimized by genetic algorithm, then a coke rate model based on support vector machine with the best parameters was built. Simulation results showed that: the forecasting model’s outcome, average absolute error and the mean relative error, was small which is based on genetic algorithm optimized SVM. coke rate model based on Genetic algorithm optimized support vector machine has high degree of accuracy and a certain practicality.


Author(s):  
Seyma Kiziltas Koc ◽  
Mustafa Yeniad

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.


In recent years, the researchers on age prediction relied on face pictures to get more attention, due to their important applications in security control and human computer interaction. Age prediction incorporates two processes: traits elicitation and prediction of machine learning. In the aspect of face traits elicitation, accurate and robust location for the trait point is convoluted and becoming a challenging issue in age prediction. Active Shape Model (ASM) can elicit the facial shape effectively and correctly. Furthermore, as the improvement of ASM, Active Appearance Models (AAM) is proposed to elicit both shape and texture traits from facial images simultaneously. In this paper, the two models are tested and compared for their performance against 6 algorithms which are Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Support Vector Regression (SVR), Canonical Correlation Analysis (CCA), Linear Discriminant Analysis (LDA), and Projection Twin Support Vector Machine (PTSVM). The experiments show that ASM is faster and gains more precise result than the AAM


Author(s):  
Seyma Kiziltas Koc ◽  
Mustafa Yeniad

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.


2019 ◽  
Vol 9 (1) ◽  
pp. 50-59
Author(s):  
Wijanarto Wijanarto ◽  
Rhatna Puspitasari

Diabetes Melitus atau yang biasanya disebut dengan penyakit kencing manis merupakan penyakit yang terjadi akibat peningkatan kadar glukosa di dalam darah terlalu tinggi. Data World Health Organization (WHO), menunjukkan Indonesia menjadi negara keempat di dunia dengan  angka penderita diabetes terbanyak dan mengalami peningkatan hingga 14 juta  orang. Peningkatan kasus penyakit Diabetes melitus ini memerlukan suatu upaya penanggulangan dan pencegahan dini terhadap penyakit Diabetes melitus. Dalam penelitian ini akan dilakukan optimasi algoritma klasifikasi biner pada penyakit diabetes melitus mulai dari observasi, visualisasi, statistic deskriptif dataset, pre-processing dataset, penentuan baseline model, tuning parameter model dan finalisasi model. Penentuan baseline model diperoleh dengan mencari nilai akurasi tertinggi dari 3 algoritma linear (Logistic Regression, Linear Discriminant Analysis, K-nearest neighbor) atau 3 algoritma non- linear (Decision Tree, Naïve Bayes, Support Vector Machine) berdasarkan tuning parameternya dan yang menghasilkan akurasi optimal adalah  Algoritma Support Vector Machine, sehingga dijadikan sebagai final model dengan parameter C sebesar 47 dengan kernel rbf dihasilkan rerata akurasi sebesar 77.3% pada data training dan 74.5% pada data testing, sementara berdasarkan confusion matrix dihasilkan precision 78%, recall 83%, f1-Score 81%, error rate 25%.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


2019 ◽  
Vol 20 (5) ◽  
pp. 488-500 ◽  
Author(s):  
Yan Hu ◽  
Yi Lu ◽  
Shuo Wang ◽  
Mengying Zhang ◽  
Xiaosheng Qu ◽  
...  

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world&#039;s highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. </P><P> Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. </P><P> Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. </P><P> Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.


Sign in / Sign up

Export Citation Format

Share Document