Automatic classification of mice vocalizations using Machine Learning techniques and Convolutional Neural Networks

Ultrasonic vocalizations (USVs) analysis is a well-recognized tool to investigate animal communication. It can be used for behavioral phenotyping of murine models of different disorders. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and they are analyzed by specific software. Different calls typologies exist, and each ultrasonic call can be manually classified, but the qualitative analysis is highly time-consuming. Considering this framework, in this work we proposed and evaluated a set of supervised learning methods for automatic USVs classification. This could represent a sustainable procedure to deeply analyze the ultrasonic communication, other than a standardized analysis. We used manually built datasets obtained by segmenting the USVs audio tracks analyzed with the Avisoft software, and then by labelling each of them into 10 representative classes. For the automatic classification task, we designed a Convolutional Neural Network that was trained receiving as input the spectrogram images associated to the segmented audio files. In addition, we also tested some other supervised learning algorithms, such as Support Vector Machine, Random Forest and Multilayer Perceptrons, exploiting informative numerical features extracted from the spectrograms. The performance showed how considering the whole time/frequency information of the spectrogram leads to significantly higher performance than considering a subset of numerical features. In the authors’ opinion, the experimental results may represent a valuable benchmark for future work in this research field.

Download Full-text

Predicting Secondary School Students' Performance Utilizing a Semi-supervised Learning Approach

Journal of Educational Computing Research ◽

10.1177/0735633117752614 ◽

2018 ◽

Vol 57 (2) ◽

pp. 448-470 ◽

Cited By ~ 11

Author(s):

Ioannis E. Livieris ◽

Konstantina Drakopoulou ◽

Vassilis T. Tampakas ◽

Tassos A. Mikropoulos ◽

Panagiotis Pintelas

Keyword(s):

Supervised Learning ◽

Prediction Models ◽

Learning Algorithms ◽

Educational Data Mining ◽

Semisupervised Learning ◽

Research Field ◽

Machine Learning Techniques ◽

School Students ◽

Supervised Learning Algorithms ◽

And Performance

Educational data mining constitutes a recent research field which gained popularity over the last decade because of its ability to monitor students' academic performance and predict future progression. Numerous machine learning techniques and especially supervised learning algorithms have been applied to develop accurate models to predict student's characteristics which induce their behavior and performance. In this work, we examine and evaluate the effectiveness of two wrapper methods for semisupervised learning algorithms for predicting the students' performance in the final examinations. Our preliminary numerical experiments indicate that the advantage of semisupervised methods is that the classification accuracy can be significantly improved by utilizing a few labeled and many unlabeled data for developing reliable prediction models.

Download Full-text

Heart ID: Human Identification Based on Radar Micro-Doppler Signatures of the Heart Using Deep Learning

Remote Sensing ◽

10.3390/rs11101220 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1220 ◽

Cited By ~ 1

Author(s):

Peibei Cao ◽

Weijie Xia ◽

Yi Li

Keyword(s):

Supervised Learning ◽

Doppler Radar ◽

User Authentication ◽

Learning Algorithms ◽

Human Identification ◽

Optical Systems ◽

Support Vector ◽

Time Frequency ◽

Average Accuracy ◽

Supervised Learning Algorithms

Human identification based on radar signatures of individual heartbeats is crucial in various applications, including user authentication in mobile devices, identification of escaped criminals, etc. Usually, optical systems employed to recognize humans are sensitive to ambient light environments, while radar does not have such a drawback, since it has high penetration and all-weather capability. Meanwhile, since micro-Doppler characteristics from the heart of different people are distinct and not easy to fake, it can be used for identification. In this paper, we employed a deep convolutional neural network (DCNN) and conventional supervised learning methods to realize heartbeat-based identification. First, the heartbeat signals were acquired by a Doppler radar and processed by short-time Fourier transform. Then, predefined features were extracted for the conventional supervised learning algorithms, while time–frequency graphs were directly inputted to the DCNN since the network had its own feature extraction part. It is shown that the DCNN could achieve average accuracy of 98.5% for identifying four people, and higher than 80% when the number of people was less than ten. For conventional supervised learning algorithms when identifying four people, the accuracy of the support vector machine (SVM) was 88.75%, and the accuracy of SVM–Bayes was 91.25%, while naive Bayes had the lowest accuracy of 80.75%.

Download Full-text

Deep Learning Representation from Electroencephalography of Early-Stage Creutzfeldt-Jakob Disease and Features for Differentiation from Rapidly Progressive Dementia

International Journal of Neural Systems ◽

10.1142/s0129065716500398 ◽

2016 ◽

Vol 27 (02) ◽

pp. 1650039 ◽

Cited By ~ 57

Author(s):

Francesco Carlo Morabito ◽

Maurizio Campolo ◽

Nadia Mammone ◽

Mario Versaci ◽

Silvana Franceschetti ◽

...

Keyword(s):

Deep Learning ◽

Supervised Learning ◽

Early Stage ◽

Processing System ◽

Fine Tuning ◽

Permutation Entropy ◽

Support Vector ◽

Progressive Dementia ◽

Time Frequency ◽

Jakob Disease

A novel technique of quantitative EEG for differentiating patients with early-stage Creutzfeldt–Jakob disease (CJD) from other forms of rapidly progressive dementia (RPD) is proposed. The discrimination is based on the extraction of suitable features from the time-frequency representation of the EEG signals through continuous wavelet transform (CWT). An average measure of complexity of the EEG signal obtained by permutation entropy (PE) is also included. The dimensionality of the feature space is reduced through a multilayer processing system based on the recently emerged deep learning (DL) concept. The DL processor includes a stacked auto-encoder, trained by unsupervised learning techniques, and a classifier whose parameters are determined in a supervised way by associating the known category labels to the reduced vector of high-level features generated by the previous processing blocks. The supervised learning step is carried out by using either support vector machines (SVM) or multilayer neural networks (MLP-NN). A subset of EEG from patients suffering from Alzheimer’s Disease (AD) and healthy controls (HC) is considered for differentiating CJD patients. When fine-tuning the parameters of the global processing system by a supervised learning procedure, the proposed system is able to achieve an average accuracy of 89%, an average sensitivity of 92%, and an average specificity of 89% in differentiating CJD from RPD. Similar results are obtained for CJD versus AD and CJD versus HC.

Download Full-text

An Opcode-Based Malware Detection Model Using Supervised Learning Algorithms

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2021100102 ◽

2021 ◽

Vol 15 (4) ◽

pp. 18-30

Author(s):

Om Prakash Samantray ◽

Satya Narayan Tripathy

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Malware Detection ◽

Support Vector ◽

Detection Accuracy ◽

Detection Techniques ◽

Detection Model ◽

Proposed Model ◽

Tree Classifier ◽

Supervised Learning Algorithms

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma

Briefings in Bioinformatics ◽

10.1093/bib/bbx153 ◽

2017 ◽

Vol 20 (3) ◽

pp. 985-994 ◽

Cited By ~ 15

Author(s):

Leili Shahriyari

Keyword(s):

Supervised Learning ◽

Unit Length ◽

Learning Algorithms ◽

Colon Adenocarcinoma ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Maximum Accuracy ◽

Normalization Methods ◽

Supervised Learning Algorithms

Abstract Motivation: One of the main challenges in machine learning (ML) is choosing an appropriate normalization method. Here, we examine the effect of various normalization methods on analyzing FPKM upper quartile (FPKM-UQ) RNA sequencing data sets. We collect the HTSeq-FPKM-UQ files of patients with colon adenocarcinoma from TCGA-COAD project. We compare three most common normalization methods: scaling, standardizing using z-score and vector normalization by visualizing the normalized data set and evaluating the performance of 12 supervised learning algorithms on the normalized data set. Additionally, for each of these normalization methods, we use two different normalization strategies: normalizing samples (files) or normalizing features (genes). Results: Regardless of normalization methods, a support vector machine (SVM) model with the radial basis function kernel had the maximum accuracy (78%) in predicting the vital status of the patients. However, the fitting time of SVM depended on the normalization methods, and it reached its minimum fitting time when files were normalized to the unit length. Furthermore, among all 12 learning algorithms and 6 different normalization techniques, the Bernoulli naive Bayes model after standardizing files had the best performance in terms of maximizing the accuracy as well as minimizing the fitting time. We also investigated the effect of dimensionality reduction methods on the performance of the supervised ML algorithms. Reducing the dimension of the data set did not increase the maximum accuracy of 78%. However, it leaded to discovery of the 7SK RNA gene expression as a predictor of survival in patients with colon adenocarcinoma with accuracy of 78%.

Download Full-text

Practical foundations of machine learning for addiction research. Part I. Methods and techniques

10.31234/osf.io/ast53 ◽

2021 ◽

Author(s):

Pablo Cresta Morgado ◽

Martín Carusso ◽

Laura Alonso Alemany ◽

Laura Acion

Keyword(s):

Machine Learning ◽

Linear Models ◽

Principal Component ◽

Research Field ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Tools ◽

Wide Range ◽

Methods And Techniques ◽

Research Problems

Machine learning assembles a broad set of methods and techniques to solve a wide range of problems, such as identifying individuals with substance use disorders (SUD), finding patterns in neuroimages, understanding SUD prognostic factors and their association, or determining addiction genetic underpinnings. However, machine learning use in the addiction research field continues to be insufficient. This two-part review focuses on machine learning tools and concepts and provides insights into their capabilities to facilitate their understanding and acquisition by addiction researchers. In this first part, we present supervised and unsupervised methods and techniques such as linear models, naive Bayes, support vector machines, artificial neural networks, k-means, or principal component analysis and examples of how these tools are already in use in addiction research. We also provide open-source programming tools to apply these techniques. Throughout this work, we link machine learning techniques to applied statistics. Machine learning tools and techniques can be applied to many addiction research problems and can improve addiction research.

Download Full-text

Identification Of Hepatocellular Carcinoma Using Supervised Learning Algorithms

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v14i3.1992 ◽

2014 ◽

Vol 14 (3) ◽

pp. 5535-5542

Author(s):

Sagri Sharma ◽

Sanjay Kadam ◽

Hemant Darbari

Keyword(s):

Gene Expression ◽

Hepatocellular Carcinoma ◽

Supervised Learning ◽

Dna Analysis ◽

Learning Algorithm ◽

Expression Profiles ◽

Learning Algorithms ◽

Support Vector ◽

Unseen Data ◽

Supervised Learning Algorithms

Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms & statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data

Download Full-text

Recognition of Impulse of Love at First Sight Based On Photoplethysmography Signal

Sensors ◽

10.3390/s20226572 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6572

Author(s):

Huan Lu ◽

Guangjie Yuan ◽

Jin Zhang ◽

Guangyuan Liu

Keyword(s):

Feature Selection ◽

Pulse Signal ◽

Recognition Algorithm ◽

Machine Learning Techniques ◽

Support Vector ◽

Signal Acquisition ◽

Interesting Phenomenon ◽

Linear Discriminant ◽

Time Frequency ◽

Gradient Enhancement

Love at first sight is a well-known and interesting phenomenon, and denotes the strong attraction to a person of the opposite sex when first meeting. As far as we know, there are no studies on the changes in physiological signals between the opposite sexes when this phenomenon occurs. Although privacy is involved, knowing how attractive a partner is may be beneficial to building a future relationship in an open society where both men and women accept each other. Therefore, this study adopts the photoplethysmography (PPG) signal acquisition method (already applied in wearable devices) to collect signals that are beneficial for utilizing the results of the analysis. In particular, this study proposes a love pulse signal recognition algorithm based on a PPG signal. First, given the high correlation between the impulse signals of love at first sight and those for physical attractiveness, photos of people with different levels of attractiveness are used to induce real emotions. Then, the PPG signal is analyzed in the time, frequency, and nonlinear domains, respectively, in order to extract its physiological characteristics. Finally, we propose the use of a variety of machine learning techniques (support vector machine (SVM), random forest (RF), linear discriminant analysis (LDA), and extreme gradient enhancement (XGBoost)) for identifying the impulsive states of love, with or without feature selection. The results show that the XGBoost classifier has the highest classification accuracy (71.09%) when using the feature selection.

Download Full-text

A Comparison of the Performance of Supervised Learning Algorithms for Solar Power Prediction

Energies ◽

10.3390/en14154424 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4424

Author(s):

Leidy Gutiérrez ◽

Julian Patiño ◽

Eduardo Duque-Grisales

Keyword(s):

Machine Learning ◽

Power Generation ◽

Large Scale ◽

Fossil Fuels ◽

Machine Learning Techniques ◽

Support Vector ◽

Power Prediction ◽

Electric Networks ◽

K Nearest Neighbors ◽

Supervised Learning Algorithms

Science seeks strategies to mitigate global warming and reduce the negative impacts of the long-term use of fossil fuels for power generation. In this sense, implementing and promoting renewable energy in different ways becomes one of the most effective solutions. The inaccuracy in the prediction of power generation from photovoltaic (PV) systems is a significant concern for the planning and operational stages of interconnected electric networks and the promotion of large-scale PV installations. This study proposes the use of Machine Learning techniques to model the photovoltaic power production for a system in Medellín, Colombia. Four forecasting models were generated from techniques compatible with Machine Learning and Artificial Intelligence methods: K-Nearest Neighbors (KNN), Linear Regression (LR), Artificial Neural Networks (ANN) and Support Vector Machines (SVM). The results obtained indicate that the four methods produced adequate estimations of photovoltaic energy generation. However, the best estimate according to RMSE and MAE is the ANN forecasting model. The proposed Machine Learning-based models were demonstrated to be practical and effective solutions to forecast PV power generation in Medellin.

Download Full-text