scholarly journals Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Xiaobo Yan ◽  
Weiqing Xiong ◽  
Liang Hu ◽  
Feng Wang ◽  
Kuo Zhao

This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL), model of missing value imputation based on binary search (MBS), and model of missing value imputation based on Gaussian mixture model (MGI). Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

PLoS ONE ◽  
2016 ◽  
Vol 11 (8) ◽  
pp. e0161112 ◽  
Author(s):  
Jing Xiao ◽  
Qiongqiong Xu ◽  
Chuanli Wu ◽  
Yuexia Gao ◽  
Tianqi Hua ◽  
...  

Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


2019 ◽  
Vol 8 (3) ◽  
pp. 3375-3380 ◽  

The Internet of Things (IoT) is the new-fangled communication paradigm in which the internet is stretched out from the virtual world to intermingle with the objects in the physical world. It unleashes a new dimension of services but at the same time, colossal challenges have to be conquered to reap the full benefits of the IoT. One such challenge is missing data imputation in Internet of Things. The presence of missing values hampers the subsequent processes such as prediction, control, decision making etc. due to the dependency of these processes on complete information. In this paper, a novel FRBIM (Fuzzy Rule-Based Imputation Model) model is proposed to impute missing data based on the characteristics of IoT data to accomplish high accuracy rate. Experimental results have proved that the proposed method has outperformed the existing KNN and AKE imputation model in terms of accuracy.


Author(s):  
Qingjuan Li ◽  
Huansheng Ning ◽  
Tao Zhu ◽  
Shan Cui ◽  
Liming Chen

AbstractWith the rapid development and large-scale uptake of the Internet of Things, smart home is evolving from a vision towards a realistically viable solution for assisted living. Activity recognition is one of the fundamental tasks in order to provide accurate and timely assistance and service. As daily living scenarios are full of similar activities, missing data, and noise, inferring complex activities using knowledge-driven reasoning algorithms suffers from several drawbacks, e.g., real-time raw sensor data segmentation, poor generalization, higher computational complexity, and scalability. To address these problems, this paper proposes a hybrid approach to complex daily activity recognition by merging the first-order logic and probability graphic modeling. Specifically, we develop a novel “Markov logic network” combining data-driven multi-feature and simplified rule-based modeling and inference, thus enabling and supporting the applicability and robustness of daily activity recognition. To evaluate the approach and associated methods, we design a testing scenario with a number of similar activity groups, missing data, or disturbance test datasets in a multi-modeling sensor scene. Initial results show our approach outperforms the traditional approach with a better accuracy in the situations of similar activities with missing data and noise disturbance. Experiments are also conducted to compare the Gibbs sampling and MC-SAT sampling algorithms for Markov logic network, and the results show that the Gibbs is better in our experimental settings.


2021 ◽  
Vol 14 (11) ◽  
pp. 2533-2545
Author(s):  
Parikshit Bansal ◽  
Prathamesh Deshpande ◽  
Sunita Sarawagi

We present DeepMVI, a deep learning method for missing value imputation in multidimensional time-series datasets. Missing values are commonplace in decision support platforms that aggregate data over long time stretches from disparate sources, whereas reliable data analytics calls for careful handling of missing data. One strategy is imputing the missing values, and a wide variety of algorithms exist spanning simple interpolation, matrix factorization methods like SVD, statistical models like Kalman filters, and recent deep learning methods. We show that often these provide worse results on aggregate analytics compared to just excluding the missing data. DeepMVI expresses the distribution of each missing value conditioned on coarse and fine-grained signals along a time series, and signals from correlated series at the same time. Instead of resorting to linearity assumptions of conventional matrix factorization methods, DeepMVI harnesses a flexible deep network to extract and combine these signals in an end-to-end manner. To prevent over-fitting with high-capacity neural networks, we design a robust parameter training with labeled data created using synthetic missing blocks around available indices. Our neural network uses a modular design with a novel temporal transformer with convolutional features, and kernel regression with learned embeddings. Experiments across ten real datasets, five different missing scenarios, comparing seven conventional and three deep learning methods show that DeepMVI is significantly more accurate, reducing error by more than 50% in more than half the cases, compared to the best existing method. Although slower than simpler matrix factorization methods, we justify the increased time overheads by showing that DeepMVI provides significantly more accurate imputation that finally impacts quality of downstream analytics.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
M. Sathya ◽  
M. Jeyaselvi ◽  
Lalitha Krishnasamy ◽  
Mohammad Mazyad Hazzazi ◽  
Prashant Kumar Shukla ◽  
...  

The Internet of Things (IoT) is enhancing our lives in a variety of structures, which consists of smarter cities, agribusiness, and e-healthcare, among others. Even though the Internet of Things has many features with the consumer Internet of Things, the open nature of smart devices and their worldwide connection make IoT networks vulnerable to a variety of assaults. Several approaches focused on attack detection in Internet of Things devices, which has the longest calculation times and the lowest accuracy issues. It is proposed in this paper that an attack detection framework for Internet of Things devices, based on the DWU-ODBN method, be developed to alleviate the existing problems. At the end of the process, the proposed method is used to identify the source of the assault. It comprises steps such as preprocessing, feature extraction, feature selection, and classification to identify the source of the attack. A random oversampler is used to preprocess the input data by dealing with NaN values, categorical features, missing values, and unbalanced datasets before being used to deal with the imbalanced dataset. When the data has been preprocessed, it is then sent to the MAD Median-KS test method, which is used to extract features from the dataset. To categorize the data into attack and nonattack categories, the features are classified using the dual weight updation-based optimal deep belief network (DWU-ODBN) classification technique, which is explained in more detail below. According to the results of the experimental assessment, the proposed approach outperforms existing methods in terms of detecting intrusions and assaults. The proposed work achieves 77 seconds to achieve the attack detection with an accuracy rate of 98.1%.


The chance of malware within the Internet of Things (IoT) surroundings is increasing due to a loss of detectors. This paper proposes a way to are expecting the intrusion of malware the usage of state-of the-art gadget mastering algorithms which could discover malware faster and greater appropriately, as compared with the existing methods (this is, payload, port-based, and statistical techniques). Clever workplace surroundings was implemented to capture the drift of packet datasets, where malware and normal packets were captured, and eleven features have been extracted from them. Four gadget getting to know algorithms (random forest, a guide vector gadget, AdaBoost, and a Gaussian mixture version–primarily based naive Bayes classifier) were investigated to implement the automatic malware monitoring gadget. Random wooded area and AdaBoost have to separate the malware and normal flows flawlessly, due to their ensemble structures, which could classify unbalanced and noisy datasets


Sign in / Sign up

Export Citation Format

Share Document