Inclusion of features derived from a mixture of time window sizes improved classification accuracy of machine learning algorithms for sheep grazing behaviours

Question terminology is a set of terms which appear in keywords, idioms and fixed expressions commonly observed in questions. This paper investigates ways to automatically extract question terminology from a corpus of questions and represent them for the purpose of classifying by question type. Our key interest is to see whether or not semantic features can enhance the representation of strongly lexical nature of question sentences. We compare two feature sets: one with lexical features only, and another with a mixture of lexical and semantic features. For evaluation, we measure the classification accuracy made by two machine learning algorithms, C5.0 and PEBLS, by using a procedure called domain cross-validation, which effectively measures the domain transferability of features.

Download Full-text

A Spasticity Assessment Method for Voluntary Movement using Data Fusion and Machine Learning

10.21203/rs.3.rs-19134/v1 ◽

2020 ◽

Author(s):

yan chen ◽

Song Yu ◽

Qing Cai ◽

Shuangyuan Huang ◽

Ke Ma ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Voluntary Movement ◽

Muscle Activation ◽

Evaluation Method ◽

Rating Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Measurement Unit ◽

Stroke Patients

Abstract Background: Spasticity is a common complication of stroke. Effective spasticity management can improve patients' recovery efficiency and reduce patients' pain. The present clinical spasticity rating scale exhibits subjectivity and a ceiling effect, which makes it difficult to evaluate spasm objectively and to clinically analyze the pathological mechanism of spasticity. The sensor-based quantitative evaluation method is an effective substitute for the clinical spasm rating scale, but currently, it mainly focuses on the spasm evaluation of passive motion. The study of spasmodic state under active exercise can provide a basis for treatment and rehabilitation training, but the evaluation method of spasmodic state under active exercise has not yet been established. Therefore, we combine inertial measurement unit (IMU) and surface electromyography (sEMG) to test the feasibility of assessing spasticity patterns in stroke patients during voluntary movement. Methods: Nine stroke patients with varying degrees of spasticity and four healthy subjects performed isometric elbow exercises. sEMG and kinematics signals were recorded for all participants. The Empirical Mode Decomposition (EMD) algorithm and double threshold algorithms were used to separate sEMG of involuntary muscle activation from voluntary activation. Then, feature extraction and feature fusion were performed. Four common machine learning algorithms are used to monitor and evaluate spasticity patterns. The validity of the proposed method is verified by comparing the classification accuracy of four machine learning models. Results: Cross-validation yielded high classification accuracies (F1-score>0.88) for all four machine learning classifiers in assessing spasticity patterns. The highest detection performance was obtained using the Random Forest algorithm (average accuracy = 0.979; macro-F1 = 0.976). Conclusions: We present a novel method for assessing post-stroke spasticity based on voluntary movement and machine learning. Good classification performance verifies the feasibility of evaluating spasticity patterns by our method. Reliable classification accuracy achieved by the machine learning algorithms indicated the potential to evaluate spasticity patterns using IMU and sEMG when stroke survivors perform voluntary movements.

Download Full-text

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

Applied Sciences ◽

10.3390/app10175956 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5956

Author(s):

Sławomir K. Zieliński ◽

Hyunkook Lee ◽

Paweł Antoniuk ◽

Oskar Dadan

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Screening Test ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Spatial Audio ◽

Extreme Gradient Boosting ◽

Music Ensemble

The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved.

Download Full-text

Comparison of classical machine learning algorithms in the task of handwritten digits classification

Journal of Computer Sciences Institute ◽

10.35784/jcsi.2723 ◽

2021 ◽

Vol 21 ◽

pp. 279-286

Author(s):

Oleksandr Voloshchenko ◽

Małgorzata Plechawska-Wójcik

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Classification Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Speed ◽

Host Machine ◽

Speed Prediction ◽

Handwritten Digit

The purpose of this paper is to compare classical machine learning algorithms for handwritten number classification. The following algorithms were chosen for comparison: Logistic Regression, SVM, Decision Tree, Random Forest and k-NN. MNIST handwritten digit database is used in the task of training and testing the above algorithms. The dataset consists of 70,000 images of numbers from 0 to 9. The algorithms are compared considering such criteria as the learning speed, prediction construction speed, host machine load, and classification accuracy. Each algorithm went through the training and testing phases 100 times, with the desired KPIs retained at each iteration. The results were averaged to reach reliable outcomes.

Download Full-text

Classification of Control and Neurodegenerative Disease Subjects Using Tree Based Classifiers

Journal of Pharmaceutical Research International ◽

10.9734/jpri/2020/v32i1130546 ◽

2020 ◽

pp. 63-73

Author(s):

Syed Ahsin Ali Shah ◽

Nazneen Habib ◽

Wajid Aziz ◽

Ehsan Ullah Khan ◽

Malik Sajjad Ahmed Nadeem

Keyword(s):

Machine Learning ◽

Early Detection ◽

Classification Accuracy ◽

Neural Control ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Pharmacological Interventions ◽

Non Invasive

Background: The medical researchers are developing different non-invasive methods for early detection of Neurodegenerative Diseases (NDDs) when pharmacological interventions are still possible to further prevent the disease progression. The NDDs are associated with the degradation in the complex gait dynamics and motor activity. The classification of gait data using machine learning techniques can assist the physicians for early diagnosis of the neural disorder when clinical manifestation of the diseases is not yet apparent. Aims: The present study was undertaken to classify the control and NDD subjects using decision trees based classifiers (Random Forest (RF), J48 and REPTree). Methodology: The data used in the study comprises of 16 control, 20 Huntington’s Disease (HD), 15 Parkinson’s Disease (PD), and 13 Amyotrophic Lateral Sclerosis (ALS) subjects, which were taken from publicly available database from Physionet. The age range of control subjects was 20-74, HD subjects was 36-70, PD subjects was 44-80, and ALS subjects was 29-71. There were 13 attributes associated with the data. Important features/attributes of the data were selected using correlation feature selection - subset evaluation (cfs) method. Three tree based machine learning algorithms (RF, J48 and REPTree) were used to classify the control and NDD subjects. The performance of classifiers were evaluated using Precision, Recall, F-Measure, MAE and RMSE. Results: In order to evaluate the performance of tree based classifiers, two different settings of data i.e. complete features and selected features were used. In classifying control vs HD subjects, RF provides the robust separation with classification accuracy of 84.79% using complete features and 83.94% using selected features. While in classifying control vs PD subjects, and control vs ALS subjects, RF also provides the best separation with classification accuracy of 86.51% and 94.95% respectively using complete features and 85.19% and 93.64% respectively using selected features. Conclusion: The variability analysis of physiological signals provides a valuable non-invasive tool for quantifying the system of dynamics of healthy subjects and to examine the alternations in the controlling mechanism of these systems with aging and disease. It is concluded that selected features encode adequate information about neural control of the gait. Moreover, the selected features along with tree based machine learning algorithms can play a vital for early detection of NDDs, when pharmacological interventions are still possible.

Download Full-text

A feature-centric spam email detection model using diverse supervised machine learning algorithms

The Electronic Library ◽

10.1108/el-07-2019-0181 ◽

2020 ◽

Vol 38 (3) ◽

pp. 633-657

Author(s):

Ammara Zamir ◽

Hikmat Ullah Khan ◽

Waqar Mehmood ◽

Tassawar Iqbal ◽

Abubakker Usman Akram

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Research Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Content Type ◽

Detection Model ◽

Proposed Model

Purpose This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this study is to exploit the role of sentiment features along with other proposed features to evaluate the classification accuracy of machine learning algorithms for spam email detection. Design/methodology/approach Existing studies primarily exploits content-based feature engineering approach; however, a limited number of features is considered. In this regard, this research study proposed a feature-centric framework (FSEDM) based on existing and novel features of email data set, which are extracted after pre-processing. Afterwards, diverse supervised learning techniques are applied on the proposed features in conjunction with feature selection techniques such as information gain, gain ratio and Relief-F to rank most prominent features and classify the emails into spam or ham (not spam). Findings Analysis and experimental results indicated that the proposed model with sentiment analysis is competitive approach for spam email detection. Using the proposed model, deep neural network applied with sentiment features outperformed other classifiers in terms of classification accuracy up to 97.2%. Originality/value This research is novel in this regard that no previous research focuses on sentiment analysis in conjunction with other email features for detection of spam emails.

Download Full-text

Nondestructive Classification of Soybean Seed Varieties by Hyperspectral Imaging and Ensemble Machine Learning Algorithms

Sensors ◽

10.3390/s20236980 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6980

Author(s):

Yanlin Wei ◽

Xiaofeng Li ◽

Xin Pan ◽

Lei Li

Keyword(s):

Machine Learning ◽

Hyperspectral Imaging ◽

Classification Accuracy ◽

Learning Algorithms ◽

Soybean Seed ◽

Machine Learning Algorithms ◽

Ensemble Classification ◽

Support Vector ◽

Ensemble Machine Learning ◽

Different Types

During the processing and planting of soybeans, it is greatly significant that a reliable, rapid, and accurate technique is used to detect soybean varieties. Traditional chemical analysis methods of soybean variety sampling (e.g., mass spectrometry and high-performance liquid chromatography) are destructive and time-consuming. In this paper, a robust and accurate method for nondestructive soybean classification is developed through hyperspectral imaging and ensemble machine learning algorithms. Image acquisition, preprocessing, and feature selection are used to obtain different types of soybean hyperspectral features. Based on these features, one of ensemble classifiers-random subspace linear discriminant (RSLD) algorithm is used to classify soybean seeds. Compared with the linear discrimination (LD) and linear support vector machine (LSVM) methods, the results show that the RSLD algorithm in this paper is more stable and reliable. In classifying soybeans in 10, 15, 20, and 25 categories, the RSLD method achieves the highest classification accuracy. When 155 features are used to classify 15 types of soybeans, the classification accuracy of the RSLD method reaches 99.2%, while the classification accuracies of the LD and LSVM methods are only 98.6% and 69.7%, respectively. Therefore, the ensemble classification algorithm RSLD can maintain high classification accuracy when different types and different classification features are used.

Download Full-text

Phishing web site detection using diverse machine learning algorithms

The Electronic Library ◽

10.1108/el-05-2019-0118 ◽

2020 ◽

Vol 38 (1) ◽

pp. 65-80 ◽

Cited By ~ 8

Author(s):

Ammara Zamir ◽

Hikmat Ullah Khan ◽

Tassawar Iqbal ◽

Nazish Yousaf ◽

Farah Aslam ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Accuracy ◽

Information Gain ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Support Vector ◽

Data Set ◽

Content Type

Purpose This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information. Design/methodology/approach Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy. Findings The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy. Originality/value This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

Download Full-text