scholarly journals Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262131
Author(s):  
Adil Aslam Mir ◽  
Kimberlee Jane Kearfott ◽  
Fatih Vehbi Çelebi ◽  
Muhammad Rafique

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1st March 2017 to the 11th of May 2018, including 4 seismic activities that have taken place during the data collection time.

Author(s):  
Chisimkwuo John ◽  
Emmanuel J. Ekpenyong ◽  
Charles C. Nworu

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.


The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


Author(s):  
Daniela A. Gomez-Cravioto ◽  
Ramon E. Diaz-Ramos ◽  
Francisco J. Cantu-Ortiz ◽  
Hector G. Ceballos

AbstractTo understand and approach the spread of the SARS-CoV-2 epidemic, machine learning offers fundamental tools. This study presents the use of machine learning techniques for projecting COVID-19 infections and deaths in Mexico. The research has three main objectives: first, to identify which function adjusts the best to the infected population growth in Mexico; second, to determine the feature importance of climate and mobility; third, to compare the results of a traditional time series statistical model with a modern approach in machine learning. The motivation for this work is to support health care providers in their preparation and planning. The methods compared are linear, polynomial, and generalized logistic regression models to describe the growth of COVID-19 incidents in Mexico. Additionally, machine learning and time series techniques are used to identify feature importance and perform forecasting for daily cases and fatalities. The study uses the publicly available data sets from the John Hopkins University of Medicine in conjunction with the mobility rates obtained from Google’s Mobility Reports and climate variables acquired from the Weather Online API. The results suggest that the logistic growth model fits best the pandemic’s behavior, that there is enough correlation of climate and mobility variables with the disease numbers, and that the Long short-term memory network can be exploited for predicting daily cases. Given this, we propose a model to predict daily cases and fatalities for SARS-CoV-2 using time series data, mobility, and weather variables.


2019 ◽  
Vol 40 (1) ◽  
pp. 127-135 ◽  
Author(s):  
Khemissi Houari ◽  
Tarik Hartani ◽  
Boualem Remini ◽  
Abdelouhab Lefkir ◽  
Leila Abda ◽  
...  

Abstract In this paper, the capacity of an Adaptive-Network-Based Fuzzy Inference System (ANFIS) for predicting salinity of the Tafna River is investigated. Time series data of daily liquid flow and saline concentrations from the gauging station of Pierre du Chat (160801) were used for training, validation and testing the hybrid model. Different methods were used to test the accuracy of our results, i.e. coefficient of determination (R2), Nash–Sutcliffe efficiency coefficient (E), root of the mean squared error (RSR) and graphic techniques. The model produced satisfactory results and showed a very good agreement between the predicted and observed data, with R2 equal (88% for training, 78.01% validation and 80.00% for testing), E equal (85.84% for training, 82.51% validation and 78.17% for testing), and RSR equal (2% for training, 10% validation and 49% for testing).


2020 ◽  
Vol 5 ◽  
Author(s):  
Kentaro Kumagai

Many public facilities such as community halls and gymnasiums are supposed to be evacuation sites when disasters occur. From the viewpoint of managing such facilities, it is necessary to monitor the usage and to respond immediately when an anomaly occurs. In this study, an integrated system of IoT sensors and machine learning for anomaly detection of pedestrian flow was proposed for buildings that are expected to be used as emergency evacuation sites in the event of a disaster. For trial practice of the system, infrared sensors were installed in a research building of a university, and data of visitors to the fourth floor of the building was collected as a time series data of pedestrian flow. As a result, it was shown that anomalies of pedestrian flow at an arbitrary time of a day with an occurrence probability of 5 % or less can be detected properly using the data collected.


The main objective of this paper is to analyze the characteristics and features that affects the fluctuations of cryptocurrency prices and to develop aninteractive cryptocurrencychatbot for providing the predictive analysis of cryptocurrency prices. The chatbot is developed using IBM Watson assistant service. The predictive analytics is performed by analyzing the datasets of various cryptocurrencies and applying appropriate time series models. Time Series Forecasting is used for predicting the future values of the prices. Predictive models like ARIMA model is used for calculating the mean squared error of the fitted model. Facebook’s package prophet () which implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly and weekly seasonality are further used to predict cryptocurrency prices.


Sign in / Sign up

Export Citation Format

Share Document