Random forests for time-dependent processes

ESAIM Probability and Statistics ◽

10.1051/ps/2020015 ◽

2020 ◽

Vol 24 ◽

pp. 801-826

Author(s):

Benjamin Goehry

Keyword(s):

Time Series ◽

Random Forest ◽

Rate Of Convergence ◽

Random Forests ◽

Time Dependent ◽

Dependent Processes ◽

Weakly Dependent ◽

Independent And Identically Distributed

Random forests were introduced by Breiman in 2001. We study theoretical aspects of both original Breiman’s random forests and a simplified version, the centred random forests. Under the independent and identically distributed hypothesis, Scornet, Biau and Vert proved the consistency of Breiman’s random forest, while Biau studied the simplified version and obtained a rate of convergence in the sparse case. However, the i.i.d hypothesis is generally not satisfied for example when dealing with time series. We extend the previous results to the case where observations are weakly dependent, more precisely when the sequences are stationary β−mixing.

Download Full-text

Multiple fault diagnosis for hydraulic systems using Nearest-centroid-with-DBA and Random-Forest-based-time-series-classification

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9189401 ◽

2020 ◽

Author(s):

Zhijie Peng ◽

Ke Zhang ◽

Yi Chai

Keyword(s):

Time Series ◽

Fault Diagnosis ◽

Random Forest ◽

Time Series Classification ◽

Hydraulic Systems ◽

Multiple Fault ◽

Multiple Fault Diagnosis

Download Full-text

Improving Medication Regimen Recommendation for Parkinson’s Disease Using Sensor Technology

Sensors ◽

10.3390/s21103553 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3553

Author(s):

Jeremy Watts ◽

Anahita Khojandi ◽

Rama Vasudevan ◽

Fatta B. Nahab ◽

Ritesh A. Ramdhani

Keyword(s):

Parkinson’S Disease ◽

Time Series ◽

Parkinson's Disease ◽

Random Forest ◽

Treatment Planning ◽

Time Series Data ◽

Classification Model ◽

Series Data ◽

Demographic Information ◽

Subjective Data

Parkinson’s disease medication treatment planning is generally based on subjective data obtained through clinical, physician-patient interactions. The Personal KinetiGraph™ (PKG) and similar wearable sensors have shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to cluster patients based on levodopa regimens and response. The resulting clusters are then used to enhance treatment planning by providing improved initial treatment estimates to supplement a physician’s initial assessment. We apply k-means clustering to a dataset of within-subject Parkinson’s medication changes—clinically assessed by the MDS-Unified Parkinson’s Disease Rating Scale-III (MDS-UPDRS-III) and the PKG sensor for movement staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective demographic information, MDS-UPDRS-III scores, and PKG time-series data. Clinically relevant clusters were partitioned by levodopa dose, medication administration frequency, and total levodopa equivalent daily dose—with the PKG providing similar symptomatic assessments to physician MDS-UPDRS-III scores. A random forest classifier trained on demographic information, MDS-UPDRS-III scores, and PKG time-series data was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 86.9%, an F1 score of 90.7%, and an AUC of 0.871. A model that relied solely on demographic information and PKG time-series data provided the next best performance with an accuracy of 83.8%, an F1 score of 88.5%, and an AUC of 0.831, hence further enabling fully remote assessments. These computational methods demonstrate the feasibility of using sensor-based data to cluster patients based on their medication responses with further potential to assist with medication recommendations.

Download Full-text

Onboard Radio Frequency Interference as the Origin of Inter-Satellite Biases for Microwave Humidity Sounders

Remote Sensing ◽

10.3390/rs11070866 ◽

2019 ◽

Vol 11 (7) ◽

pp. 866 ◽

Cited By ~ 2

Author(s):

Imke Hans ◽

Martin Burgdorf ◽

Stefan A. Buehler

Keyword(s):

Time Series ◽

Radio Frequency ◽

Time Dependent ◽

Radio Frequency Interference ◽

Climate Variables ◽

Compelling Evidence ◽

Climate Data ◽

The Earth ◽

Correction Scheme

Understanding the causes of inter-satellite biases in climate data records from observations of the Earth is crucial for constructing a consistent time series of the essential climate variables. In this article, we analyse the strong scan- and time-dependent biases observed for the microwave humidity sounders on board the NOAA-16 and NOAA-19 satellites. We find compelling evidence that radio frequency interference (RFI) is the cause of the biases. We also devise a correction scheme for the raw count signals for the instruments to mitigate the effect of RFI. Our results show that the RFI-corrected, recalibrated data exhibit distinctly reduced biases and provide consistent time series.

Download Full-text

Optimal Rate of Convergence for Empirical Quantiles and Distribution Functions for Time Series

Journal of Time Series Analysis ◽

10.1111/jtsa.12189 ◽

2016 ◽

Vol 37 (6) ◽

pp. 825-836

Author(s):

Moritz Jirak

Keyword(s):

Time Series ◽

Rate Of Convergence ◽

Distribution Functions ◽

Optimal Rate ◽

Optimal Rate Of Convergence

Download Full-text

Subsampling for Heavy Tailed, Nonstationary and Weakly Dependent Time Series

Applied Condition Monitoring - Cyclostationarity: Theory and Methods – IV ◽

10.1007/978-3-030-22529-2_2 ◽

2019 ◽

pp. 19-40

Author(s):

Elżbieta Gajecka-Mirek ◽

Jacek Leśkow

Keyword(s):

Time Series ◽

Heavy Tailed ◽

Weakly Dependent

Download Full-text

Numerical study on CP of RC structures regarding the significance of the 100 mV decay criterion considering time dependent processes

Materials and Corrosion ◽

10.1002/maco.201810486 ◽

2018 ◽

Vol 70 (4) ◽

pp. 642-651 ◽

Cited By ~ 1

Author(s):

Christian Helm ◽

Michael Raupach

Keyword(s):

Numerical Study ◽

Time Dependent ◽

Rc Structures ◽

Dependent Processes

Download Full-text

Automatic Mapping of Irrigated Areas in Mediteranean Context Using Landsat 8 Time Series Images and Random Forest Algorithm

IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2018.8517810 ◽

2018 ◽

Cited By ~ 2

Author(s):

Z. Benbahria ◽

I. Sebari ◽

H. Hajji ◽

M. F. Smiej

Keyword(s):

Time Series ◽

Random Forest ◽

Landsat 8 ◽

Random Forest Algorithm ◽

Automatic Mapping ◽

Time Series Images

Download Full-text

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Download Full-text

Species-specific audio detection: A comparison of three template-based classification algorithms using random forests

10.7287/peerj.preprints.2713 ◽

2017 ◽

Author(s):

Carlos J Corrada Bravo ◽

Rafael Álvarez Berríos ◽

T. Mitchell Aide

Keyword(s):

Random Forest ◽

Random Forests ◽

Random Forest Classifier ◽

Classification Algorithms ◽

Statistical Features ◽

Web Based ◽

Average Accuracy ◽

Species Specific ◽

Web Based System

We developed a web-based cloud-hosted system that allow users to archive, listen, visualize, and annotate recordings. The system also provides tools to convert these annotations into datasets that can be used to train a computer to detect the presence or absence of a species. The algorithm used by the system was selected after comparing the accuracy and efficiency of three variants of a template-based classification. The algorithm computes a similarity vector by comparing a template of a species call with time increments across the spectrogram. Statistical features are extracted from this vector and used as input for a Random Forest classifier that predicts presence or absence of the species in the recording. The fastest algorithm variant had the highest average accuracy and specificity; therefore, it was implemented in the ARBIMON web-based system.

Download Full-text

For Honor, for Toxicity

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474680 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-29

Author(s):

Alessandro Canossa ◽

Dmitry Salimov ◽

Ahmad Azadvar ◽

Casper Harteveld ◽

Georgios Yannakakis

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forests ◽

Initial Study ◽

Unfair Advantage ◽

Offensive Behavior ◽

Forest Models ◽

Random Forest Models ◽

Action Type ◽

Degree Of Severity

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.

Download Full-text