Hyperparameter tuning for multi-label classification of feedbacks in online courses

In this work, we propose the extension of a methodology for the multi-label classification of feedback according to the Hattie and Timperley feedback model, incorporating a hyperparameter tuning stage. It is analyzed whether the incorporation of the hyperparameter tuning stage prior to the execution of the algorithms support vector machines, random forest and multi-label k-nearest neighbors, improves the performance metrics of multi-label classifiers that automatically locate the feedback generated by a teacher to the activities sent by students in online courses on the Blackboard platform at the task, process, regulation, praise and other levels proposed in the feedback model by Hattie and Timperley. The grid search strategy is used to refine the hyperparameters of each algorithm. The results show that the adjustment of the hyperparameters improves the performance metrics for the data set used.

Download Full-text

A new classification strategy for human activity recognition using cost sensitive support vector machines for imbalanced data

Kybernetes ◽

10.1108/k-07-2014-0138 ◽

2014 ◽

Vol 43 (8) ◽

pp. 1150-1164 ◽

Cited By ~ 9

Author(s):

Bilal M’hamed Abidine ◽

Belkacem Fergani ◽

Mourad Oussalah ◽

Lamya Fergani

Keyword(s):

Support Vector Machines ◽

Probabilistic Models ◽

Conditional Random Fields ◽

Performance Metrics ◽

Imbalanced Data ◽

Sampling Technique ◽

Support Vector ◽

Data Set ◽

Content Type ◽

Vector Machines

Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.

Download Full-text

Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning

Sensors ◽

10.3390/s20082403 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2403

Author(s):

Jakub Browarczyk ◽

Adam Kurowski ◽

Bozena Kostek

Keyword(s):

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Mental States ◽

Extraction Methods ◽

Support Vector ◽

Discrete Wavelet ◽

K Nearest Neighbors ◽

Vector Machines

The aim of the study is to compare electroencephalographic (EEG) signal feature extraction methods in the context of the effectiveness of the classification of brain activities. For classification, electroencephalographic signals were obtained using an EEG device from 17 subjects in three mental states (relaxation, excitation, and solving logical task). Blind source separation employing independent component analysis (ICA) was performed on obtained signals. Welch’s method, autoregressive modeling, and discrete wavelet transform were used for feature extraction. Principal component analysis (PCA) was performed in order to reduce the dimensionality of feature vectors. k-Nearest Neighbors (kNN), Support Vector Machines (SVM), and Neural Networks (NN) were employed for classification. Precision, recall, F1 score, as well as a discussion based on statistical analysis, were shown. The paper also contains code utilized in preprocessing and the main part of experiments.

Download Full-text

A Recommendation System & Their Performance Metrics using several ML Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5791.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2445-2451

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Search Engine ◽

Recommendation System ◽

Performance Metrics ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Google Search

Recommendation systems are subdivision of Refine Data that request to anticipate ranking or liking a user would give to an item. Recommended systems produce user customized exhortations for product or service. Recommended systems are used in different services like Google Search Engine, YouTube, Gmail and also Product recommendation service on any E-Commerce website. These systems usually depends on content based approach. in this paper, we develop these type recommended systems by using several algorithms like K-Nearest neighbors(KNN), Support-Vector Machine(SVM), Logistic Regression(LR), MultinomialNB(MNB),and Multi-layer Perception(MLP). These will predict nearest categories from the News Category Data, among these categories we will recommend the most common sentence to a user and we analyze the performance metrics. This approach is tested on News Category Data set. This data set having more or less 200k Headlines of News and 41 classes, collected from the Huff post from the year of 2012-2018.

Download Full-text

Satellite image classification and quality parame-ters using ML classifier

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.8.9441 ◽

2018 ◽

Vol 7 (1.8) ◽

pp. 6

Author(s):

K. Radhika ◽

S. Varadarajan

Keyword(s):

Maximum Likelihood ◽

Satellite Images ◽

Performance Metrics ◽

Satellite Image ◽

Feature Space ◽

Support Vector ◽

Vector Machines ◽

Supervised Classifiers ◽

Multispectral Satellite Images

Remote sensing images are an important source of information regarding the Earth surface. For many applications like geology, urban planning, forest and land cover/land use, the underlying information from such images is needed. Extraction of this information is usually achieved through a classification process which is one of the most powerful tools in digital image processing. Good classifier is required to extract the information in satellite images. Latest methods used for classification of pixels in multispectral satellite images are supervised classifiers such as Support Vector Machines (SVM), k-Nearest Number (K-NN) and Maximum Likelihood (ML) classifier. SVM may be one-class SVM or multi-class SVM. K-NN is simple technique in high-dimensional feature space. In ML classifier, classification is based on the maximum likelihood of the pixel. The performance metrics for these classifiers are calculated and compared. Totally 200 points have been considered for validation purpose.

Download Full-text

Classification of Soils into Hydrologic Groups Using Machine Learning

Data ◽

10.3390/data5010002 ◽

2019 ◽

Vol 5 (1) ◽

pp. 2 ◽

Cited By ~ 4

Author(s):

Shiny Abraham ◽

Chau Huynh ◽

Huy Vu

Keyword(s):

Machine Learning ◽

Water Conservation ◽

Large Scale ◽

Performance Metrics ◽

Gaussian Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

Group B ◽

Soil Groups

Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on their interpretation of guidelines published by regional or national agencies. As a result, large-scale mapping of hydrologic soil groups results in widespread inconsistencies and inaccuracies. This paper presents an application of machine learning for classification of soil into hydrologic groups. Based on features such as percentages of sand, silt and clay, and the value of saturated hydraulic conductivity, machine learning models were trained to classify soil into four hydrologic groups. The results of the classification obtained using algorithms such as k-Nearest Neighbors, Support Vector Machine with Gaussian Kernel, Decision Trees, Classification Bagged Ensembles and TreeBagger (Random Forest) were compared to those obtained using estimation based on soil texture. The performance of these models was compared and evaluated using per-class metrics and micro- and macro-averages. Overall, performance metrics related to kNN, Decision Tree and TreeBagger exceeded those for SVM-Gaussian Kernel and Classification Bagged Ensemble. Among the four hydrologic groups, it was noticed that group B had the highest rate of false positives.

Download Full-text

Exploring a knowledge-based approach to predicting NACE codes of enterprises based on web page texts

Statistical Journal of the IAOS ◽

10.3233/sji-200675 ◽

2020 ◽

Vol 36 (3) ◽

pp. 807-821

Author(s):

Heidi Kühnemann ◽

Arnout van Delden ◽

Dick Windmeijer

Keyword(s):

Economic Activity ◽

Model Performance ◽

Support Vector ◽

Web Pages ◽

Data Set ◽

Filter Model ◽

Domain Specific ◽

Knowledge Based ◽

Vector Machines

Classification of enterprises by main economic activity according to NACE codes is a challenging but important task for national statistical institutes. Since manual editing is time-consuming, we investigated the automatic prediction from dedicated website texts using a knowledge-based approach. To that end, concept features were derived from a set of domain-specific keywords. Furthermore, we compared flat classification to a specific two-level hierarchy which was based on an approach used by manual editors. We limited ourselves to Naïve Bayes and Support Vector Machines models and only used texts from the main web pages. As a first step, we trained a filter model that classifies whether websites contain information about economic activity. The resulting filtered data set was subsequently used to predict 111 NACE classes. We found that using concept features did not improve the model performance compared to a model with character n-grams, i.e. non-informative features. Neither did the two-level hierarchy improve the performance relative to a flat classification. Nonetheless, prediction of the best three NACE classes clearly improved the overall prediction performance compared to a top-one prediction. We conclude that more effort is needed in order to achieve good results with a knowledge-based approach and discuss ideas for improvement.

Download Full-text

An Enhanced Corpus for Arabic Newspapers Comments

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/12 ◽

2020 ◽

Vol 17 (5) ◽

pp. 789-798

Author(s):

Hichem Rahab ◽

Abdelhafid Zitouni ◽

Mahieddine Djoudi

Keyword(s):

Support Vector Machines ◽

Web Sites ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors ◽

Vector Machines

In this paper, we propose our enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments. The developed approach has to enhance an existing approach by the enrichment of the available corpus and the inclusion of the annotation step by following the Model Annotate Train Test Evaluate Revise (MATTER) approach. A corpus is created by collecting comments from web sites of three well know Algerian newspapers. Three classifiers, support vector machines, naïve Bayes, and k-nearest neighbors, were used for classification of comments into positive and negative classes. To identify the influence of the stemming in the obtained results, the classification was tested with and without stemming. Obtained results show that stemming does not enhance considerably the classification due to the nature of Algerian comments tied to Algerian Arabic Dialect. The promising results constitute a motivation for us to improve our approach especially in dealing with non Arabic sentences, especially Dialectal and French ones

Download Full-text