missing data problem
Recently Published Documents


TOTAL DOCUMENTS

55
(FIVE YEARS 13)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
pp. 429-452
Author(s):  
Ruth H. Keogh ◽  
Jonathan W. Bartlett

Author(s):  
Pengfei Zhang ◽  
Zhenliang Ma ◽  
Xiaoxiong Weng ◽  
Haris N. Koutsopoulos

Data quality is the foundation of data-driven applications in transportation. Data problems such as missing and invalid data could sharply reduce the performance of the methods used in these applications. Although there exist plenty of studies related to data quality issues, they only focus on missing or invalid data caused by infrastructure failures (e.g., loop detector malfunction). In general, there is a lack of attention to data quality issues from insufficient data management. This paper proposes a tensor decomposition based framework to tackle a specific missing data problem which occurs when the machine-station dictionary of an automated fare collection system database is incomplete. In such cases, there is a large amount of loss of origin/destination information as the affected machines are not linked to any station. Consequently, all associated transactions may miss the origin/destination information. The proposed framework recovers the dictionary by capturing features of the passenger flow passing through the unlinked fare machine. Evaluation results show that the proposed approach could recover the missing data with high accuracy even when several fare machines are not linked to a station. The framework could also support other beneficial applications.


Author(s):  
Mr Almelu ◽  
Dr. S. Veenadhari ◽  
Kamini Maheshwar

The Internet of Things (IoT) systems create a large amount of sensing information. The consistency of this information is an essential problem for ensuring the quality of IoT services. The IoT data, however, generally suffers due to a variety of factors such as collisions, unstable network communication, noise, manual system closure, incomplete values and equipment failure. Due to excessive latency, bandwidth limitations, and high communication costs, transferring all IoT data to the cloud to solve the missing data problem may have a detrimental impact on network performance and service quality. As a result, the issue of missing information should be addressed as soon as feasible by offloading duties like data prediction or estimations closer to the source. As a result, the issue of incomplete information must be addressed as soon as feasible by offloading duties such as predictions or assessment to the network’s edge devices. In this work, we show how deep learning may be used to offload tasks in IoT applications.


Author(s):  
Karim H. Erian ◽  
Pedro H. Regalado ◽  
James M. Conrad

This paper discusses a novel algorithm for solving a missing data problem in the machine learning pre-processing stage. A model built to help lenders evaluate home loans based on numerous factors by learning from available user data, is adopted in this paper as an example. If one of the factors is missing for a person in the dataset, the currently used methods delete the whole entry therefore reducing the size of the dataset and affecting the machine learning model accuracy. The novel algorithm aims to avoid losing entries for missing factors by breaking the dataset into multiple subsets, building a different machine learning model for each subset, then combining the models into one machine learning model. In this manner, the model makes use of all available data and only neglects the missing values. Overall, the new algorithm improved the prediction accuracy by 5% from 93% accuracy to 98% in the home loan example.


2020 ◽  
Vol 20 (23) ◽  
pp. 13984-13998
Author(s):  
Jinghan Du ◽  
Minghua Hu ◽  
Weining Zhang

2020 ◽  
Author(s):  
Bryor Snefjella ◽  
Idan Blank

For close to 70 years psychologists have studied word meaning using a simple method: participants rate words on some theoretically motivated property (e.g. pleasantness, familiarity) using a Likert scale as the measurement instrument. Such semantic judgments serve as a means of interrogating the underlying structure of lexical semantic constructs, to select stimuli for experiments, or as covariates in models predicting brain or behaviour. Recently, there has been a surge of interest in using computational distributional semantic word representations and supervised learning to predict semantic judgments on Likert scales for words lacking empirical measurements. We call this task semantic norm extrapolation. A significant body of work has developed showing methods for semantic norm extrapolation are often highly accurate. The impressive performance of models for this task may give the appearance that non-empirical, machine learning derived estimates of semantic norms are interchangeable with empirical measurements of semantic norms. Herein, we argue that this is not the case, and that all extant methods for semantic norm extrapolation are more problematic than the literature suggests. Naive use of extrapolated semantic norms should be expected to yield biased and anti-conservative analyses. We make this argument using a mixture of 1) the principles of analysis of partially observed data, 2) simulations, and 3) a real-data example. Achieving sound inference when using semantic norm extrapolation requires a conceptual and methodological shift from treating semantic norm extrapolation as a prediction problem to treating it as a missing data problem. This shift in perspective also lays bare problems in default analytical procedures of semantic norms and megastudy data, and surprisingly suggests that semantic norm extrapolation --- when done using recommended procedures for analysis of partially observed data --- should be default methodological practice.


Bad Science ◽  
2020 ◽  
pp. 135-138
Author(s):  
Florian Meinfelder ◽  
Rebekka Kluge

Sign in / Sign up

Export Citation Format

Share Document