Animal Activity Recognition From Sensor Data Using Ensemble Learning

Animal Activity

Animal activity recognition is an important task to monitor the behavior of animals to know their health condition and psychological state. To provide a solution for this need, this study is aimed to build an internet of things (IoT) system that predicts the activities of animals based on sensor data obtained from embedded devices attached to animals. This chapter especially considers the problem of prediction of goat activity using three types of sensors: accelerometer, gyroscope, and magnetometer. Five possible goat activities are of interest, including stationary, grazing, walking, trotting, and running. The utility of five ensemble learning methods was investigated, including random forest, extremely randomized trees, bagging trees, gradient boosting, and extreme gradient boosting. The results showed that all these methods achieved good performance (>94%) on the datasets. Therefore, this study can be successfully used by professionals such as farmers, vets, and animal behaviorists where animal tracking may be crucial.

A Multi-Feature Ensemble Learning Classification Method for Ship Classification with Space-Based AIS Data

Applied Sciences ◽

10.3390/app112110336 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10336

Author(s):

Yitao Wang ◽

Lei Yang ◽

Xin Song ◽

Quan Chen ◽

Zhenguo Yan

Keyword(s):

Time Series ◽

Random Forest ◽

Ensemble Learning ◽

Classification Model ◽

Gradient Boosting ◽

Identification System ◽

Dynamic Feature ◽

Passenger Ships ◽

Ship Classification

AIS (Automatic Identification System) is an effective navigation aid system aimed to realize ship monitoring and collision avoidance. Space-based AIS data, which are received by satellites, have become a popular and promising approach for providing ship information around the world. To recognize the types of ships from the massive space-based AIS data, we propose a multi-feature ensemble learning classification model (MFELCM). The method consists of three steps. Firstly, the static and dynamic information of the original data is preprocessed and features are then extracted in order to obtain static feature samples, dynamic feature distribution samples, time-series samples, and time-series feature samples. Secondly, four base classifiers, namely Random Forest, 1D-CNN (one-dimensional convolutional neural network), Bi-GRU (bidirectional gated recurrent unit), and XGBoost (extreme gradient boosting), are trained by the above four types of samples, respectively. Finally, the base classifiers are integrated by another Random Forest, and the final ship classification is outputted. In this paper, we use the global space-based AIS data of passenger ships, cargo ships, fishing boats, and tankers. The model gets a total accuracy of 0.9010 and an F1 score of 0.9019. The experiments prove that MFELCM is better than the base classifiers. In addition, MFELCM can achieve near real-time online classification, which has important applications in ship behavior anomaly detection and maritime supervision.

Performing technical analysis to predict Japan REITs' movement through ensemble learning

Journal of Property Investment and Finance ◽

10.1108/jpif-01-2020-0007 ◽

2020 ◽

Vol 38 (6) ◽

pp. 551-562

Author(s):

Wei Kang Loo

Keyword(s):

Random Forest ◽

Ensemble Learning ◽

Forecast Accuracy ◽

Gradient Boosting ◽

Learning Models ◽

Content Type ◽

Technical Indicators ◽

Back Testing ◽

Trading Model

PurposeThe purpose of this study is to evaluate the performance of the ensemble learning models, such as the Random Forest and Extreme Gradient Boosting models, in predicting the direction of the Japan real estate investment trusts (J-REITs) at different return horizons, based on input obtained from various technical indicators.Design/methodology/approachThis study measures the predictability of J-REITs with technical indicators by using different horizons of REITs' return and machine learning models. The ensemble learning models includes Random Forest and Extreme Gradient Boosting models while the return horizons of REITs ranging from 1 to 300 days. The results were further split into individual years to check for the consistency of the performance across time.FindingsThe Extreme Gradient Boosting appears to be the best method in improving forecast accuracy but not the trading return. A wider return horizons platform seemed to deliver a relatively better performance in both forecast accuracy and trading return, when compared to the return horizon of one.Practical implicationsIt is recommended that the Extreme Gradient Boosting and Random Forest model be considered by practitioners for back-testing trading model. In addition, selecting different return horizons so as to achieve a better performance in trading/investment should also be considered.Originality/valueThe predictability of J-REITs using technical indicators was compared among different returns horizons and the models (Extreme Gradient Boosting and Random Forest).

An effective adaptive customization framework for small manufacturing plants using extreme gradient boosting-XGBoost and random forest ensemble learning algorithms in an Industry 4.0 environment

Machine Learning with Applications ◽

10.1016/j.mlwa.2021.100024 ◽

2021 ◽

pp. 100024

Author(s):

Sonia Kahiomba Kiangala ◽

Zenghui Wang

Keyword(s):

Random Forest ◽

Ensemble Learning ◽

Industry 4.0 ◽

Learning Algorithms ◽

Gradient Boosting ◽

Manufacturing Plants ◽

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features and tree-rings

Environmental Earth Sciences ◽

10.1007/s12665-021-10054-5 ◽

2021 ◽

Vol 80 (22) ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkaman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Random Forest ◽

Tree Rings ◽

Gradient Boosting ◽

Boosting Algorithms ◽

Streamflow Modeling

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Modeling and analysis of COVID-19 new deaths using tree-based ensemble

10.36227/techrxiv.16566012.v1 ◽

2021 ◽

Author(s):

Ibrahim Abaker Targio Hashem ◽

Raja Sher Afgun Usmani ◽

Asad Ali Shah ◽

Abdulwahab Ali Almazroi ◽

Muhammad Bilal

Keyword(s):

Infectious Disease ◽

United States ◽

Random Forest ◽

Economic Activity ◽

The United States ◽

Gradient Boosting ◽

Health Crisis ◽

Modeling And Analysis ◽

The World

The COVID-19 pandemic has emerged as the world's most serious health crisis, affecting millions of people all over the world. The majority of nations have imposed nationwide curfews and reduced economic activity to combat the spread of this infectious disease. Governments are monitoring the situation and making critical decisions based on the daily number of new cases and deaths reported. Therefore, this study aims to predict the daily new deaths using four tree-based ensemble models i.e., Gradient Tree Boosting (GB), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Voting Regressor (VR) for the three most affected countries, which are the United States, Brazil, and India. The results showed that VR outperformed other models in predicting daily new deaths for all three countries. The predictions of daily new deaths made using VR for Brazil and India are very close to the actual new deaths, whereas the prediction of daily new deaths for the United States still needs to be improved.<br>

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.