Neural Activity Classification with Machine Learning Models Trained on Interspike Interval Time-Series Data

Mapping Intimacies ◽

10.1101/2021.03.24.436765 ◽

2021 ◽

Author(s):

Ivan Lazarevich ◽

Ilya Prokin ◽

Boris Gutkin ◽

Victor Kazantsev

Keyword(s):

Machine Learning ◽

Time Series ◽

Language Processing ◽

Neural Activity ◽

Time Series Data ◽

Series Data ◽

Neural Decoding ◽

Learning Models ◽

Wide Range ◽

Machine Learning Models

Modern well-performing approaches to neural decoding are based on machine learning models such as decision tree ensembles and deep neural networks. The wide range of algorithms that can be utilized to learn from neural spike trains, which are essentially time-series data, results in the need for diverse and challenging benchmarks for neural decoding, similar to the ones in the fields of computer vision and natural language processing. In this work, we propose a spike train classification benchmark, based on open-access neural activity datasets and consisting of several learning tasks such as stimulus type classification, animal’s behavioral state prediction and neuron type identification. We demonstrate that an approach based on hand-crafted time-series feature engineering establishes a strong baseline performing on par with state-of-the-art deep learning based models for neural decoding. We release the code allowing to reproduce the reported results 1.

Download Full-text

Intra-domain and cross-domain transfer learning for time series

10.5194/egusphere-egu21-12142 ◽

2021 ◽

Author(s):

Erik Otović ◽

Marko Njirjak ◽

Dario Jozinović ◽

Goran Mauša ◽

Alberto Michelini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Transfer Learning ◽

Time Series Data ◽

The Other ◽

Series Data ◽

Sound Recognition ◽

Transfer Of Knowledge ◽

Learning Models ◽

Machine Learning Models

In this study, we compared the performance of machine learning models trained using transfer learning and those that were trained from scratch - on time series data. Four machine learning models were used for the experiment. Two models were taken from the field of seismology, and the other two are general-purpose models for working with time series data. The accuracy of selected models was systematically observed and analyzed when switching within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). In seismology, we used two databases of local earthquakes (one in counts, and the other with the instrument response removed) and a database of global earthquakes for predicting earthquake magnitude; other datasets targeted classifying spoken words (speech), predicting stock prices (finance) and classifying muscle movement from EMG signals (medicine). In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model. Therefore, in our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic such conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that used transfer learning for training and those that were trained from scratch. We compared the performances between pairs of models in order to draw conclusions about the utility of transfer learning. In order to confirm the validity of the obtained results, we repeated the experiments several times and applied statistical tests to confirm the significance of the results. The study shows when, within the set experimental framework, the transfer of knowledge brought improvements in terms of model accuracy and in terms of model convergence rate. Our results show that it is possible to achieve better performance and faster convergence by transferring knowledge from the domain of global earthquakes to the domain of local earthquakes; sometimes also vice versa. However, improvements in seismology can sometimes also be achieved by transferring knowledge from medical and audio domains. The results show that the transfer of knowledge between other domains brought even more significant improvements, compared to those within the field of seismology. For example, it has been shown that models in the field of sound recognition have achieved much better performance compared to classical models and that the domain of sound recognition is very compatible with knowledge from other domains. We came to similar conclusions for the domains of medicine and finance. Ultimately, the paper offers suggestions when transfer learning is useful, and the explanations offered can provide a good starting point for knowledge transfer using time series data.

Download Full-text

Large-scale Retrieval of Bayesian Machine Learning Models for Time Series Data via Gaussian Processes

Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.5220/0010109700710080 ◽

2020 ◽

Author(s):

Fabian Berns ◽

Christian Beecks

Keyword(s):

Machine Learning ◽

Time Series ◽

Gaussian Processes ◽

Large Scale ◽

Time Series Data ◽

Series Data ◽

Learning Models ◽

Bayesian Machine Learning ◽

Machine Learning Models

Download Full-text

Large-scale Retrieval of Bayesian Machine Learning Models for Time Series Data via Gaussian Processes

Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.5220/0010109700650074 ◽

2020 ◽

Author(s):

Fabian Berns ◽

Christian Beecks

Keyword(s):

Machine Learning ◽

Time Series ◽

Gaussian Processes ◽

Large Scale ◽

Time Series Data ◽

Series Data ◽

Learning Models ◽

Bayesian Machine Learning ◽

Machine Learning Models

Download Full-text

Selection of Input Factors and Comparison of Machine Learning Models for Prediction of Dissolved Oxygen in Gyeongan Stream

Journal of Korean Society of Environmental Engineers ◽

10.4491/ksee.2021.43.3.206 ◽

2021 ◽

Vol 43 (3) ◽

pp. 206-217

Author(s):

Min Ji Kim ◽

Seon Jeong Byeon ◽

Kyung Min Kim ◽

Johng-Hwa Ahn

Keyword(s):

Neural Network ◽

Machine Learning ◽

Electrical Conductivity ◽

Time Series ◽

Performance Evaluation ◽

Suspended Solids ◽

Time Series Data ◽

Series Data ◽

Learning Models ◽

Machine Learning Models

Objectives : In this study, we select input factors for machine learning models to predict dissolved oxygen (DO) in Gyeongan Stream and compare results of performance evaluation indicators to find the optimal model.Methods : The water quality data from the specific points of Gyeongan Stream were collected between January 15, 1998 and December 30, 2019. The pretreatment data were divided into train and test data with the ratio of 7:3. We used random forest (RF), artificial neural network (ANN), convolutional neural network (CNN), and gated recurrent unit (GRU) among machine learning. RF and ANN were tested by both random split and time series data, while CNN and GRU conducted the experiment using only time series data. Performance evaluation indicators such as square of the correlation coefficient (R2), root mean square error (RMSE), and mean absolute error (MAE) were used to compare the optimal results for the models.Results and Discussion : Based on the RF variable importance results and references, water temperature, pH, electrical conductivity, PO4-P, NH4-N, total phosphorus, suspended solids, and NO3-N were used as input factors. Both RF and ANN performed better with time series data than random split. The model performance was good in order of RF > CNN > GRU > ANN.Conclusions : The eight input factors (water temperature, pH, electrical conductivity, PO4-P, NH4-N, total phosphorus, suspended solids, and NO3-N) were selected for machine learning models to predict DO in Gyeongan Stream. The best model for DO prediction was the RF model with time series data. Therefore, we suggest that the RF with the eight input factors could be used to predict the DO in streams.

Download Full-text

Machine Learning Models with Time-Series Clinical Features to Predict Radiographic Progression in Patients with Ankylosing Spondylitis

10.21203/rs.3.rs-934182/v1 ◽

2021 ◽

Author(s):

Bon San Koo ◽

Miso Jang ◽

Ji Seon Oh ◽

Keewon Shin ◽

Seunghun Lee ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Ankylosing Spondylitis ◽

Random Forest ◽

Clinical Features ◽

Radiographic Progression ◽

Time Series Data ◽

Series Data ◽

Learning Models ◽

Machine Learning Models

Abstract Background: Radiographic progression in patients with ankylosing spondylitis (AS) varies between individuals, and its evaluation requires a long period of time. Previous statistical studies for radiographic progression have limitations in integrating and analyzing multiple variables of various types. The purpose of this study was to establish the application of machine learning models for predicting radiographic progression in patients with AS using time-series data from electronic medical records (EMRs).Methods: EMR data, including baseline characteristics, laboratory finding, drug administration, and modified Stoke Ankylosing Spondylitis Spine Score (mSASSS), were collected from 1,123 AS patients who were followed up for 18 years at a common center at the time of first (T1), second (T2), and third (T3) visits. The radiographic progression of the (n + 1)th visit (Pn+1 = (mSASSSn+1 – mSASSSn) / (Tn+1 – Tn) ≥ 1 unit per year) was predicted using follow-up visit datasets from T1 to Tn. Three machine learning methods (logistic regression with least absolute shrinkage and selection operation, random forest, and extreme gradient boosting algorithms) with three-fold cross validation were used. Results: The random forest model using the T1 EMR dataset showed the highest performance in predicting the radioactive progression P2 among all the machine learning models tested. The mean accuracy and the area under the curves were 73.73% and 0.79, respectively. Among the variables of T1, the most important variables for predicting radiographic progression were in the order of total mSASSS, age, and alkaline phosphatase. Conclusion: Prognosis predictive models using time-series data showed reasonable performance with clinical features of the first visit dataset for predicting radiographic progression. Additional feature data such as spine radiographs or life-log data may improve the performance of these models.

Download Full-text

A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-4-w2-15-2017 ◽

2017 ◽

Vol IV-4/W2 ◽

pp. 15-22 ◽

Cited By ~ 28

Author(s):

J. Fan ◽

Q. Li ◽

J. Hou ◽

X. Feng ◽

H. Karimian ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Time Series ◽

Missing Values ◽

Series Data ◽

Gradient Boosting ◽

Learning Models ◽

Missing Value ◽

Deep Recurrent Neural Network ◽

Machine Learning Models

Time series data in practical applications always contain missing values due to sensor malfunction, network failure, outliers etc. In order to handle missing values in time series, as well as the lack of considering temporal properties in machine learning models, we propose a spatiotemporal prediction framework based on missing value processing algorithms and deep recurrent neural network (DRNN). By using missing tag and missing interval to represent time series patterns, we implement three different missing value fixing algorithms, which are further incorporated into deep neural network that consists of LSTM (Long Short-term Memory) layers and fully connected layers. Real-world air quality and meteorological datasets (Jingjinji area, China) are used for model training and testing. Deep feed forward neural networks (DFNN) and gradient boosting decision trees (GBDT) are trained as baseline models against the proposed DRNN. Performances of three missing value fixing algorithms, as well as different machine learning models are evaluated and analysed. Experiments show that the proposed DRNN framework outperforms both DFNN and GBDT, therefore validating the capacity of the proposed framework. Our results also provides useful insights for better understanding of different strategies that handle missing values.

Download Full-text

A versatile computational algorithm for time-series data analysis and machine-learning models

npj Parkinson s Disease ◽

10.1038/s41531-021-00240-4 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Taylor Chomiak ◽

Neilen P. Rasiah ◽

Leonardo A. Molina ◽

Bin Hu ◽

Jaideep S. Bains ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Computational Algorithm ◽

Time Series Data ◽

Simulated Data ◽

Series Data ◽

Learning Models ◽

Machine Learning Model ◽

Time Series Data Analysis

AbstractHere we introduce Local Topological Recurrence Analysis (LoTRA), a simple computational approach for analyzing time-series data. Its versatility is elucidated using simulated data, Parkinsonian gait, and in vivo brain dynamics. We also show that this algorithm can be used to build a remarkably simple machine-learning model capable of outperforming deep-learning models in detecting Parkinson’s disease from a single digital handwriting test.

Download Full-text

Improving Current Glycated Hemoglobin Prediction in Adults: Consistency and Robustness of Machine Learning Algorithms with Electronic Health Records (Preprint)

10.2196/preprints.25237 ◽

2020 ◽

Author(s):

Zakhriya Alhassan ◽

MATTHEW WATSON ◽

David Budgen ◽

Riyad Alshammari ◽

Ali Alessan ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Time Series Data ◽

Learning Model ◽

Series Data ◽

Learning Models ◽

Health Records ◽

Conventional Machine ◽

Deep Learning Model ◽

Machine Learning Models

BACKGROUND Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems such as diabetes and cardiovascular diseases. Early preventive interventions based upon advanced predictive models using electronic health records (EHR) data for such patients can ultimately help provide better health outcomes. OBJECTIVE Our study investigates the performance of predictive models to forecast HbA1c elevation levels by employing machine learning approaches using data from current and previous visits in the EHR systems for patients who had not been previously diagnosed with any type of diabetes. METHODS This study employed one statistical model and three commonly used conventional machine learning models, as well as a deep learning model, to predict patients’ current levels of HbA1c. For the deep learning model, we also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and have an understanding of the reasons behind the models' decisions. All models were trained and tested using a large and naturally balanced dataset from Saudi Arabia with 18,844 unique patient records. RESULTS The machine learning models achieved the best results for predicting current HbA1c elevation risk. The deep learning model outperformed the statistical and conventional machine learning models with respect to all reported measures when employing time-series data. The best performing model was the multi-layer perceptron (MLP) which achieved an accuracy of 74.52% when used with historical data. CONCLUSIONS This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels. For deep learning in particular, utilizing the patient's longitudinal time-series data improved the performance and affected the relative importance for the predictors used. The models showed robust results that were consistent with comparable studies.

Download Full-text

A Labeling Method for Financial Time Series Prediction Based on Trends

Entropy ◽

10.3390/e22101162 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1162

Author(s):

Dingming Wu ◽

Xiaolong Wang ◽

Jingyong Su ◽

Buzhou Tang ◽

Shaocong Wu

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Financial Time Series ◽

Time Series Prediction ◽

Series Data ◽

Learning Models ◽

Finance Industry ◽

Financial Time ◽

Labeling Method

Time series prediction has been widely applied to the finance industry in applications such as stock market price and commodity price forecasting. Machine learning methods have been widely used in financial time series prediction in recent years. How to label financial time series data to determine the prediction accuracy of machine learning models and subsequently determine final investment returns is a hot topic. Existing labeling methods of financial time series mainly label data by comparing the current data with those of a short time period in the future. However, financial time series data are typically non-linear with obvious short-term randomness. Therefore, these labeling methods have not captured the continuous trend features of financial time series data, leading to a difference between their labeling results and real market trends. In this paper, a new labeling method called “continuous trend labeling” is proposed to address the above problem. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias in traditional data standardization or normalization processes. Then, a detailed logical explanation was given, the definition of continuous trend labeling was proposed and also an automatic labeling algorithm was given to extract the continuous trend features of financial time series data. Experiments on the Shanghai Composite Index and Shenzhen Component Index and some stocks of China showed that our labeling method is a much better state-of-the-art labeling method in terms of classification accuracy and some other classification evaluation metrics. The results of the paper also proved that deep learning models such as LSTM and GRU are more suitable for dealing with the prediction of financial time series data.

Download Full-text

Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals

Entropy ◽

10.3390/e23081064 ◽

2021 ◽

Vol 23 (8) ◽

pp. 1064

Author(s):

Michele Resta ◽

Anna Monreale ◽

Davide Bacciu

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Large Body ◽

Machine Learning Techniques ◽

Series Data ◽

Physiological Data ◽

Predictive Analysis ◽

Sequential Data ◽

Learning Models ◽

Machine Learning Models

The biomedical field is characterized by an ever-increasing production of sequential data, which often come in the form of biosignals capturing the time-evolution of physiological processes, such as blood pressure and brain activity. This has motivated a large body of research dealing with the development of machine learning techniques for the predictive analysis of such biosignals. Unfortunately, in high-stakes decision making, such as clinical diagnosis, the opacity of machine learning models becomes a crucial aspect to be addressed in order to increase the trust and adoption of AI technology. In this paper, we propose a model agnostic explanation method, based on occlusion, that enables the learning of the input’s influence on the model predictions. We specifically target problems involving the predictive analysis of time-series data and the models that are typically used to deal with data of such nature, i.e., recurrent neural networks. Our approach is able to provide two different kinds of explanations: one suitable for technical experts, who need to verify the quality and correctness of machine learning models, and one suited to physicians, who need to understand the rationale underlying the prediction to make aware decisions. A wide experimentation on different physiological data demonstrates the effectiveness of our approach both in classification and regression tasks.

Download Full-text