Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1st March 2017 to the 11th of May 2018, including 4 seismic activities that have taken place during the data collection time.

Download Full-text

Time Series Data Analysis and Fault Diagnosis of Plant Process Equipment Using Statistical Machine Learning Method

Korean Journal of Computational Design and Engineering ◽

10.7315/cde.2018.193 ◽

2018 ◽

Vol 23 (3) ◽

pp. 193-201

Author(s):

Se-Yun Hwang ◽

Jeeyeon Heo ◽

Kyu-Tack Hong ◽

Jang-Hyun Lee

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Process Equipment ◽

Machine Learning Method ◽

Learning Method ◽

Statistical Machine Learning ◽

Time Series Data Analysis ◽

Plant Process

Download Full-text

Imputation of Missing Values in Economic and Financial Time Series Data Using Five Principal Component Analysis (PCA) Approaches

Central Bank of Nigeria Journal of Applied Statistics ◽

10.33429/cjas.10119.3/6 ◽

2019 ◽

pp. 51-73

Author(s):

Chisimkwuo John ◽

Emmanuel J. Ekpenyong ◽

Charles C. Nworu

Keyword(s):

Private Sector ◽

Least Squares ◽

Missing Values ◽

Time Series Data ◽

Mean Squared Error ◽

Forecast Error ◽

Series Data ◽

Imputation Methods ◽

Root Mean Squared Error ◽

Squared Error

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.

Download Full-text

A machine learning method to determine intrinsic dimension of time series data

2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) ◽

10.1109/globalsip.2017.8308653 ◽

2017 ◽

Author(s):

Claudio Turchetti ◽

Laura Falaschetti

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Machine Learning Method ◽

Learning Method ◽

Intrinsic Dimension

Download Full-text

Risk Monitoring and Quantitative Results of Various Attributes of Machine Learning Algorithms with a Time Series Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9570.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4018-4022

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Series Data ◽

Machine Learning Algorithm ◽

Risk Modelling ◽

Risk Monitoring ◽

Quantitative Results

The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area

Download Full-text

A mean squared error criterion for time series data windows

Biometrika ◽

10.1093/biomet/75.3.485 ◽

1988 ◽

Vol 75 (3) ◽

pp. 485-490 ◽

Cited By ~ 8

Author(s):

CLIFFORD M. HURVICH

Keyword(s):

Time Series ◽

Time Series Data ◽

Mean Squared Error ◽

Series Data ◽

Error Criterion ◽

Squared Error ◽

Data Windows

Download Full-text

Data Analysis and Forecasting of the COVID-19 Spread: A Comparison of Recurrent Neural Networks and Time Series Models

Cognitive Computation ◽

10.1007/s12559-021-09885-y ◽

2021 ◽

Author(s):

Daniela A. Gomez-Cravioto ◽

Ramon E. Diaz-Ramos ◽

Francisco J. Cantu-Ortiz ◽

Hector G. Ceballos

Keyword(s):

Machine Learning ◽

Time Series ◽

Health Care Providers ◽

Time Series Data ◽

Machine Learning Techniques ◽

Series Data ◽

Logistic Growth ◽

Care Providers ◽

Modern Approach ◽

Feature Importance

AbstractTo understand and approach the spread of the SARS-CoV-2 epidemic, machine learning offers fundamental tools. This study presents the use of machine learning techniques for projecting COVID-19 infections and deaths in Mexico. The research has three main objectives: first, to identify which function adjusts the best to the infected population growth in Mexico; second, to determine the feature importance of climate and mobility; third, to compare the results of a traditional time series statistical model with a modern approach in machine learning. The motivation for this work is to support health care providers in their preparation and planning. The methods compared are linear, polynomial, and generalized logistic regression models to describe the growth of COVID-19 incidents in Mexico. Additionally, machine learning and time series techniques are used to identify feature importance and perform forecasting for daily cases and fatalities. The study uses the publicly available data sets from the John Hopkins University of Medicine in conjunction with the mobility rates obtained from Google’s Mobility Reports and climate variables acquired from the Weather Online API. The results suggest that the logistic growth model fits best the pandemic’s behavior, that there is enough correlation of climate and mobility variables with the disease numbers, and that the Long short-term memory network can be exploited for predicting daily cases. Given this, we propose a model to predict daily cases and fatalities for SARS-CoV-2 using time series data, mobility, and weather variables.

Download Full-text

A hybrid model for modelling the salinity of the Tafna River in Algeria

Journal of Water and Land Development ◽

10.2478/jwld-2019-0014 ◽

2019 ◽

Vol 40 (1) ◽

pp. 127-135 ◽

Cited By ~ 2

Author(s):

Khemissi Houari ◽

Tarik Hartani ◽

Boualem Remini ◽

Abdelouhab Lefkir ◽

Leila Abda ◽

...

Keyword(s):

Hybrid Model ◽

Time Series Data ◽

Mean Squared Error ◽

Fuzzy Inference ◽

Coefficient Of Determination ◽

Series Data ◽

Efficiency Coefficient ◽

Inference System ◽

Squared Error ◽

Good Agreement

Abstract In this paper, the capacity of an Adaptive-Network-Based Fuzzy Inference System (ANFIS) for predicting salinity of the Tafna River is investigated. Time series data of daily liquid flow and saline concentrations from the gauging station of Pierre du Chat (160801) were used for training, validation and testing the hybrid model. Different methods were used to test the accuracy of our results, i.e. coefficient of determination (R2), Nash–Sutcliffe efficiency coefficient (E), root of the mean squared error (RSR) and graphic techniques. The model produced satisfactory results and showed a very good agreement between the predicted and observed data, with R2 equal (88% for training, 78.01% validation and 80.00% for testing), E equal (85.84% for training, 82.51% validation and 78.17% for testing), and RSR equal (2% for training, 10% validation and 49% for testing).

Download Full-text

Anomaly Detection of Pedestrian Flow: A Machine Learning Method for Monitoring-Data of Visitors to a Building

Collective Dynamics ◽

10.17815/cd.2020.31 ◽

2020 ◽

Vol 5 ◽

Author(s):

Kentaro Kumagai

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Time Series Data ◽

Emergency Evacuation ◽

Integrated System ◽

Series Data ◽

Machine Learning Method ◽

Pedestrian Flow ◽

Infrared Sensors ◽

Public Facilities

Many public facilities such as community halls and gymnasiums are supposed to be evacuation sites when disasters occur. From the viewpoint of managing such facilities, it is necessary to monitor the usage and to respond immediately when an anomaly occurs. In this study, an integrated system of IoT sensors and machine learning for anomaly detection of pedestrian flow was proposed for buildings that are expected to be used as emergency evacuation sites in the event of a disaster. For trial practice of the system, infrared sensors were installed in a research building of a university, and data of visitors to the fourth floor of the building was collected as a time series data of pedestrian flow. As a result, it was shown that anomalies of pedestrian flow at an arbitrary time of a day with an occurrence probability of 5 % or less can be detected properly using the data collected.

Download Full-text

Predictive Analysis of Cryptocurrencies for Developing an Interactive Cryptocurrency Chatbot using IBM Watson Assistant

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8485.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 436-447

Keyword(s):

Time Series ◽

Time Series Data ◽

Mean Squared Error ◽

Predictive Analytics ◽

Arima Model ◽

Additive Model ◽

Series Data ◽

Predictive Analysis ◽

Squared Error ◽

Ibm Watson

The main objective of this paper is to analyze the characteristics and features that affects the fluctuations of cryptocurrency prices and to develop aninteractive cryptocurrencychatbot for providing the predictive analysis of cryptocurrency prices. The chatbot is developed using IBM Watson assistant service. The predictive analytics is performed by analyzing the datasets of various cryptocurrencies and applying appropriate time series models. Time Series Forecasting is used for predicting the future values of the prices. Predictive models like ARIMA model is used for calculating the mean squared error of the fitted model. Facebook’s package prophet () which implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly and weekly seasonality are further used to predict cryptocurrency prices.

Download Full-text