classifier selection
Recently Published Documents


TOTAL DOCUMENTS

207
(FIVE YEARS 43)

H-INDEX

23
(FIVE YEARS 3)

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7791
Author(s):  
Ge Gao ◽  
Zhixin Li ◽  
Zhan Huan ◽  
Ying Chen ◽  
Jiuzhen Liang ◽  
...  

With the rapid development of the computer and sensor field, inertial sensor data have been widely used in human activity recognition. At present, most relevant studies divide human activities into basic actions and transitional actions, in which basic actions are classified by unified features, while transitional actions usually use context information to determine the category. For the existing single method that cannot well realize human activity recognition, this paper proposes a human activity classification and recognition model based on smartphone inertial sensor data. The model fully considers the feature differences of different properties of actions, uses a fixed sliding window to segment the human activity data of inertial sensors with different attributes and, finally, extracts the features and recognizes them on different classifiers. The experimental results show that dynamic and transitional actions could obtain the best recognition performance on support vector machines, while static actions could obtain better classification effects on ensemble classifiers; as for feature selection, the frequency-domain feature used in dynamic action had a high recognition rate, up to 99.35%. When time-domain features were used for static and transitional actions, higher recognition rates were obtained, 98.40% and 91.98%, respectively.


Author(s):  
Donald Douglas Atsa'am ◽  
Ruth Wario

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.


The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.


2021 ◽  
Vol 11 (10) ◽  
pp. 1274
Author(s):  
Xiangyu Qian ◽  
Ye Qiu ◽  
Qingzu He ◽  
Yuer Lu ◽  
Hai Lin ◽  
...  

Multiple types of sleep arousal account for a large proportion of the causes of sleep disorders. The detection of sleep arousals is very important for diagnosing sleep disorders and reducing the risk of further complications including heart disease and cognitive impairment. Sleep arousal scoring is manually completed by sleep experts by checking the recordings of several periods of sleep polysomnography (PSG), which is a time-consuming and tedious work. Therefore, the development of efficient, fast, and reliable automatic sleep arousal detection system from PSG may provide powerful help for clinicians. This paper reviews the automatic arousal detection methods in recent years, which are based on statistical rules and deep learning methods. For statistical detection methods, three important processes are typically involved, including preprocessing, feature extraction and classifier selection. For deep learning methods, different models are discussed by now, including convolution neural network (CNN), recurrent neural network (RNN), long-term and short-term memory neural network (LSTM), residual neural network (ResNet), and the combinations of these neural networks. The prediction results of these neural network models are close to the judgments of human experts, and these methods have shown robust generalization capabilities on different data sets. Therefore, we conclude that the deep neural network will be the main research method of automatic arousal detection in the future.


Author(s):  
Makarand Velankar ◽  
Vaibhav Khatavkar ◽  
Vinayak Jagtap ◽  
Parag Kulkarni

Features play a crucial role in several computational tasks. Feature values are input to machine learning algorithms for the prediction. The prediction accuracy depends on various factors such as selection of dataset, features and machine learning classifiers. Various feature selection and reduction approaches are experimented with to obtain better accuracies and reduce the computational overheads. Feature engineering is designing new features suitable for a specific task with the help of domain knowledge. The challenges in feature engineering are presented for the computational music domain as a case study. The experiments are performed with different combinations of feature sets and machine learning classifiers to test the accuracy of the proposed model. Music emotion recognition is used as a case study for the experimentation. Experimental results for the task of music emotion recognition provide insights into the role of features and classifiers in prediction accuracy. Different machine learning classifiers provided varied results, and the choice of a classifier is also an important decision to be made in the proposed model. The engineered features designed with the help of domain experts improved the results. It emphasizes the need for feature engineering for different domains for prediction accuracy improvement. Approaches to design an optimized model with the appropriate feature set and classifier for machine learning tasks are presented.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Shu-Tong Xie ◽  
Zong-Bao He ◽  
Qiong Chen ◽  
Rong-Xin Chen ◽  
Qing-Zhao Kong ◽  
...  

Online and offline blended teaching mode, the future trend of higher education, has recently been widely used in colleges around the globe. In the article, we conducted a study on students’ learning behavior analysis and student performance prediction based on the data about students’ behavior logs in three consecutive years of blended teaching in a college’s “Java Language Programming” course. Firstly, the data from diverse platforms such as MOOC, Rain Classroom, PTA, and cnBlog are integrated and preprocessed. Secondly, a novel multiclass classification framework, combining the genetic algorithm (GA) and the error correcting output codes (ECOC) method, is developed to predict the grade levels of students. In the framework, GA is designed to realize both the feature selection and binary classifier selection to fit the ECOC models. Finally, key factors affecting grades are identified in line with the optimal subset of features selected by GA, which can be analyzed for teaching significance. The results show that the multiclass classification algorithm designed in this article can effectively predict grades compared with other algorithms. In addition, the selected subset of features corresponding to learning behaviors is pedagogically instructive.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zhibin Xiong ◽  
Jun Huang

Purpose Ensemble models that combine multiple base classifiers have been widely used to improve prediction performance in credit risk evaluation. However, an arbitrary selection of base classifiers is problematic. The purpose of this paper is to develop a framework for selecting base classifiers to improve the overall classification performance of an ensemble model. Design/methodology/approach In this study, selecting base classifiers is treated as a feature selection problem, where the output from a base classifier can be considered a feature. The proposed correlation-based classifier selection using the maximum information coefficient (MIC-CCS), a correlation-based classifier selection under the maximum information coefficient method, selects the features (classifiers) using nonlinear optimization programming, which seeks to optimize the relationship between the accuracy and diversity of base classifiers, based on MIC. Findings The empirical results show that ensemble models perform better than stand-alone ones, whereas the ensemble model based on MIC-CCS outperforms the ensemble models with unselected base classifiers and other ensemble models based on traditional forward and backward selection methods. Additionally, the classification performance of the ensemble model in which correlation is measured with MIC is better than that measured with the Pearson correlation coefficient. Research limitations/implications The study provides an alternate solution to effectively select base classifiers that are significantly different, so that they can provide complementary information and, as these selected classifiers have good predictive capabilities, the classification performance of the ensemble model is improved. Originality/value This paper introduces MIC to the correlation-based selection process to better capture nonlinear and nonfunctional relationships in a complex credit data structure and construct a novel nonlinear programming model for base classifiers selection that has not been used in other studies.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 90-90
Author(s):  
M. Maciukiewicz ◽  
J. Schniering ◽  
H. Gabrys ◽  
M. Brunner ◽  
C. Blüthgen ◽  
...  

Background:The interstitial lung disease (ILD) associated with connective tissue diseases including systemic sclerosis (SSc) is heterogenous disease characterized by reduced survival of approximately 3 years (1). “Radiomics’’ is a field of research which describes the in-depth analysis of tissues by computational retrieval of high-dimensional quantitative features from medical images (2). Our previous study suggested capacity of radiomics features to differentiate between “high” and “low” risk groups for lung function decline in two independent cohorts (3).Objectives:  •bTo develop robust, machine learning (ML) workflow for “radiomics” data in SSc-ILD to select optimal methods for prediction.  •oTo predict the time to individual lung function decline defined as defined by the time to a relative decline of ≥ 15% in Forced Vital Capacity (FVC)% as previously (3), using workflow.Methods:We investigated two cohorts of SSc-ILD: 90 patients (76.7% female, median age 57.5 years) from the University Hospital Zurich and 66 patients (75.8% female, median age 61.0 years) from Oslo University Hospital’s. Patients were retrospectively selected if (3): a) diagnosed with early/mild SSc according to the Very Early Diagnosis of Systemic Sclerosis (VEDOSS) criteria, b) presence of ILD on HRCT as determined by a senior radiologist. For every subject, we defined 1,355 robust radiomic features from HRCT images. The follow-up period was defined as the time interval between baseline visit and the last available follow-up visit.We have developed a systematic computational workflow to build predictive ML models. To reduce the number of redundant radiomic features, we applied correlation thresholds. We applied distinct methods including 1) Lasso Penalized Regression for feature selection, and 2) Random Forest (RF) for modeling using the R package ‘caret’. To select the optimal ML model, we randomly divided derivation cohort into Training (70%) and Holdout (30%) sets and applied fivefold cross-validation (5kCV) for feature and classifier selection on Training set only.Results:We have investigated various methods to select the optimal set of predictive radiomic features. Since the ML model performance is affected by both, feature, and classifier selection, we assessed these factors first.Results from feature filtering and selection, suggested that the combination of correlation threshold of 0.9 with Lasso regression proved best. As we perform feature selection in 5k CV workflow, features present in at least 2 sets entered model optimization step.During model selection, we selected RF classifier. We detected positive correlation between actual and predicted values with Spearman’s rho = 0.313, p = 0.167 and Spearman’s rho = 0.341, p = 0.015 in Oslo and Holdout sets respectively, as shown on Figure 1. The percentage of variance remained modest for both Holdout (Rsq = 0.104) and Oslo (Rsq = 0.126) datasets.Figure 1.Performance of the best, RF classifier shown as scatterplot between actual and predicted values of individual time to lung decline.Conclusion:In summary, we: (1) developed ML workflow that allowed to select o optimal methodology for modeling (i.e., feature and classifier selection), and (2) provide models that predicted time to individual lung function decline, characterized by significant correlation between predicted and actual values.References:[1]Hansell DM, Goldin JG, King TE, Jr., Lynch DA, Richeldi L, Wells AU. CT staging and monitoring of fibrotic interstitial lung diseases in clinical practice and treatment trials: a position paper from the Fleischner Society. Lancet Respir Med. 2015;3(6):483-96.[2]Lambin, P. et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48, 441–446 (2012).[3]Schniering J. et al. Resolving phenotypic and prognostic differences in interstitial lung disease related to systemic sclerosis by computed tomography-based radiomics. https://www.medrxiv.org/content/10.1101/2020.06.09.20124800v1Disclosure of Interests:None declared


Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 818
Author(s):  
Eustace M. Dogo ◽  
Nnamdi I. Nwulu ◽  
Bhekisipho Twala ◽  
Clinton Aigbavboa

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learning for Dynamic Ensemble Selection (META-DES) in combination with homogeneous and heterogeneous ensemble models and three SMOTE-based resampling algorithms (SMOTE, SMOTE+ENN and SMOTE+Tomek Links), and one missing data method (missForest) are proposed and evaluated. A binary real-world drinking-water quality anomaly detection dataset is utilised to evaluate the models. The experimental results obtained reveal all the models benefitting from the combined optimisation of both the classifiers and resampling methods. Considering the three performance measures (balanced accuracy, F-score and G-mean), the result also shows that the dynamic classifier selection (DCS) techniques, in particular, the missForest+SMOTE+RANK and missForest+SMOTE+OLA models based on homogeneous ensemble-bagging with decision tree as the base classifier, exhibited better performances in terms of balanced accuracy and G-mean, while the Bg+mF+SMENN+LCA model based on homogeneous ensemble-bagging with random forest has a better overall F1-measure in comparison to the other models.


Sign in / Sign up

Export Citation Format

Share Document