An evaluation of machine learning classifiers for next-generation, continuous-ethogram smart trackers

Abstract Background Our understanding of movement patterns and behaviours of wildlife has advanced greatly through the use of improved tracking technologies, including application of accelerometry (ACC) across a wide range of taxa. However, most ACC studies either use intermittent sampling that hinders continuity or continuous data logging relying on tracker retrieval for data downloading which is not applicable for long term study. To allow long-term, fine-scale behavioural research, we evaluated a range of machine learning methods for their suitability for continuous on-board classification of ACC data into behaviour categories prior to data transmission. Methods We tested six supervised machine learning methods, including linear discriminant analysis (LDA), decision tree (DT), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and extreme gradient boosting (XGBoost) to classify behaviour using ACC data from three bird species (white stork Ciconia ciconia, griffon vulture Gyps fulvus and common crane Grus grus) and two mammals (dairy cow Bos taurus and roe deer Capreolus capreolus). Results Using a range of quality criteria, SVM, ANN, RF and XGBoost performed well in determining behaviour from ACC data and their good performance appeared little affected when greatly reducing the number of input features for model training. On-board runtime and storage-requirement tests showed that notably ANN, RF and XGBoost would make suitable on-board classifiers. Conclusions Our identification of using feature reduction in combination with ANN, RF and XGBoost as suitable methods for on-board behavioural classification of continuous ACC data has considerable potential to benefit movement ecology and behavioural research, wildlife conservation and livestock husbandry.

Download Full-text

Machine learning for the extragalactic astronomy educational manual

Proceedings of the International Astronomical Union ◽

10.1017/s1743921321000132 ◽

2019 ◽

Vol 15 (S367) ◽

pp. 461-463

Author(s):

Maksym Vasylenko ◽

Daria Dobrycheva

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Programming Language ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Python Programming Language ◽

Python Programming

AbstractWe evaluated a new approach to the automated morphological classification of large galaxy samples based on the supervised machine learning techniques (Naive Bayes, Random Forest, Support Vector Machine, Logistic Regression, and k-Nearest Neighbours) and Deep Learning using the Python programming language. A representative sample of ∼315000 SDSS DR9 galaxies at z < 0.1 and stellar magnitudes r < 17.7m was considered as a target sample of galaxies with indeterminate morphological types. Classical machine learning methods were used to binary morphologically classification of galaxies into early and late types (96.4% with Support Vector Machine). Deep machine learning methods were used to classify images of galaxies into five visual types (completely rounded, rounded in-between, smooth cigar-shaped, edge-on, and spiral) with the Xception architecture (94% accuracy for four classes and 88% for cigar-like galaxies). These results created a basis for educational manual on the processing of large data sets in the Python programming language, which is intended for students of the Ukrainian universities.

Download Full-text

Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours

PLoS ONE ◽

10.1371/journal.pone.0166898 ◽

2016 ◽

Vol 11 (12) ◽

pp. e0166898 ◽

Cited By ~ 15

Author(s):

Monique A. Ladds ◽

Adam P. Thompson ◽

David J. Slip ◽

David P. Hocking ◽

Robert G. Harcourt

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Hydraulic Flow Unit Classification and Prediction Using Machine Learning Techniques: A Case Study from the Nam Con Son Basin, Offshore Vietnam

Energies ◽

10.3390/en14227714 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7714

Author(s):

Ha Quang Man ◽

Doan Huy Hien ◽

Kieu Duy Thong ◽

Bui Viet Dung ◽

Nguyen Minh Hoa ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Flow Unit ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Log Data ◽

Hydraulic Flow ◽

Core Data ◽

Machine Learning Methods

The test study area is the Miocene reservoir of Nam Con Son Basin, offshore Vietnam. In the study we used unsupervised learning to automatically cluster hydraulic flow units (HU) based on flow zone indicators (FZI) in a core plug dataset. Then we applied supervised learning to predict HU by combining core and well log data. We tested several machine learning algorithms. In the first phase, we derived hydraulic flow unit clustering of porosity and permeability of core data using unsupervised machine learning methods such as Ward’s, K mean, Self-Organize Map (SOM) and Fuzzy C mean (FCM). Then we applied supervised machine learning methods including Artificial Neural Networks (ANN), Support Vector Machines (SVM), Boosted Tree (BT) and Random Forest (RF). We combined both core and log data to predict HU logs for the full well section of the wells without core data. We used four wells with six logs (GR, DT, NPHI, LLD, LSS and RHOB) and 578 cores from the Miocene reservoir to train, validate and test the data. Our goal was to show that the correct combination of cores and well logs data would provide reservoir engineers with a tool for HU classification and estimation of permeability in a continuous geological profile. Our research showed that machine learning effectively boosts the prediction of permeability, reduces uncertainty in reservoir modeling, and improves project economics.

Download Full-text

Studi Komparasi Metode Machine Learning untuk Klasifikasi Citra Huruf Vokal Hiragana

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3083 ◽

2021 ◽

Vol 5 (3) ◽

pp. 905

Author(s):

Muhammad Afrizal Amrustian ◽

Vika Febri Muliati ◽

Elsa Elvira Awal

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

The Comparative Study

Japanese is one of the most difficult languages to understand and read. Japanese writing that does not use the alphabet is the reason for the difficulty of the Japanese language to read. There are three types of Japanese, namely kanji, katakana, and hiragana. Hiragana letters are the most commonly used type of writing. In addition, hiragana has a cursive nature, so each person's writing will be different. Machine learning methods can be used to read Japanese letters by recognizing the image of the letters. The Japanese letters that are used in this study are hiragana vowels. This study focuses on conducting a comparative study of machine learning methods for the image classification of Japanese letters. The machine learning methods that were successfully compared are Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbor. The results of the comparative study show that the K-Nearest Neighbor method is the best method for image classification of hiragana vowels. K-Nearest Neighbor gets an accuracy of 89.4% with a low error rate.

Download Full-text

Machine Learning Applications for Mass Spectrometry-Based Metabolomics

Metabolites ◽

10.3390/metabo10060243 ◽

2020 ◽

Vol 10 (6) ◽

pp. 243 ◽

Cited By ~ 7

Author(s):

Ulf W. Liebal ◽

An N. T. Phan ◽

Malvika Sudhakar ◽

Karthik Raman ◽

Lars M. Blank

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Data Analysis ◽

Metabolic Engineering ◽

Data Representation ◽

Heterogeneous Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.

Download Full-text

Supervised machine learning methods in psychology: A practical introduction with annotated R code

10.31234/osf.io/s72vu ◽

2019 ◽

Author(s):

Hannes Rosenbusch ◽

Felix Soldner ◽

Anthony M Evans ◽

Marcel Zeelenberg

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Psychological Research ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Comprehensive Overview ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Out Of Sample

Machine learning methods for pattern detection and prediction are increasingly prevalent in psychological research. We provide a comprehensive overview of machine learning, its applications, and how to implement models for research. We review fundamental concepts of machine learning, such as prediction accuracy and out-of-sample evaluation, and summarize four standard prediction algorithms: linear regressions, ridge regressions, decision trees, and random forests (plus k-nearest neighbors, Naïve Bayes classifiers, and support vector machines in the supplementary material). This selection provides a set of powerful models that are implemented regularly in machine learning projects. We demonstrate each method with examples and annotated R code, and discuss best practices for determining sample sizes; comparing model performances; tuning prediction models; preregistering prediction models; and reporting results. Finally, we discuss the value of machine learning methods in maintaining psychology’s status as a predictive science.

Download Full-text

Classification of Rock Mineral in Field X based on Spectral Data (SWIR & TIR) using Supervised Machine Learning Methods

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/830/1/012042 ◽

2021 ◽

Vol 830 (1) ◽

pp. 012042

Author(s):

S A Pane ◽

F M H Sihombing

Keyword(s):

Machine Learning ◽

Spectral Data ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods ◽

Rock Mineral

Download Full-text

Classification of Single Cell Types using Small Sets of Expressed Genes: Comparative Analysis of Supervised Machine Learning Methods

10.1109/bibm52615.2021.9669844 ◽

2021 ◽

Author(s):

Aleksandar Veljkovic ◽

Mirjana Maljkovic ◽

Nenad Mitic ◽

Sasa Malkov ◽

Minjie Lyu ◽

...

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Evaluation of Supervised Learning Models in Predicting Greenhouse Energy Demand and Production for Intelligent and Sustainable Operations

Energies ◽

10.3390/en14196297 ◽

2021 ◽

Vol 14 (19) ◽

pp. 6297

Author(s):

Laila Ouazzani Chahidi ◽

Marco Fossa ◽

Antonella Priarone ◽

Abdellah Mechaqrane

Keyword(s):

Machine Learning ◽

Intelligent Control ◽

Energy Demand ◽

Well Being ◽

Supervised Machine Learning ◽

Support Vector ◽

Photovoltaic Module ◽

Learning Methods ◽

Sustainable Operations ◽

Machine Learning Methods

Plants need a specific environment to grow and reproduce in fine fettle. Nevertheless, climatic conditions are not stable and can impact their well-being and, consequently, harvest quality. Thus, greenhouse cultivation is one of the suitable agricultural techniques for creating and controlling the inside microclimate to be adequate for plant growth. The relevance of greenhouse control is widely recognized. The prediction of greenhouse variables using artificial intelligence methods is of great interest for intelligent control and the potential reduction in energetic and financial losses. However, the studies carried out in this context are still more or less limited and several machine learning methods have not been sufficiently exploited. The aim of this study is to predict the air conditioning electrical consumption and photovoltaic module electrical production at the smart Agro-Manufacturing Laboratory (SamLab) greenhouse, located in Albenga, north-western Italy. Different supervised machine learning methods were compared, namely, Artificial Neural Networks (ANNs), Gaussian Process Regression (GPR), Support Vector Machine (SVM) and Boosting trees. We evaluated the performance of the models based on three statistical indicators: the coefficient of correlation (R), the normalized root mean square error (nRMSE) and the normalized mean absolute error (nMAE). The results show good agreement between the measured and predicted values for all models, with a correlation coefficient R > 0.9, considering the validation set. The good performance of the models affirms the importance of this approach and that it can be used to further improve greenhouse efficiency through its intelligent control.

Download Full-text