Incomplete Data Matrices and Tests on Randomly Missing Data

Author(s):  
U. Bankhofer
2015 ◽  
Vol 143 ◽  
pp. 146-151 ◽  
Author(s):  
Laura Folguera ◽  
Jure Zupan ◽  
Daniel Cicerone ◽  
Jorge F. Magallanes

2015 ◽  
Vol 4 (2) ◽  
pp. 74
Author(s):  
MADE SUSILAWATI ◽  
KARTIKA SARI

Missing data often occur in agriculture and animal husbandry experiment. The missing data in experimental design makes the information that we get less complete. In this research, the missing data was estimated with Yates method and Expectation Maximization (EM) algorithm. The basic concept of the Yates method is to minimize sum square error (JKG), meanwhile the basic concept of the EM algorithm is to maximize the likelihood function. This research applied Balanced Lattice Design with 9 treatments, 4 replications and 3 group of each repetition. Missing data estimation results showed that the Yates method was better used for two of missing data in the position on a treatment, a column and random, meanwhile the EM algorithm was better used to estimate one of missing data and two of missing data in the position of a group and a replication. The comparison of the result JKG of ANOVA showed that JKG of incomplete data larger than JKG of incomplete data that has been added with estimator of data. This suggest  thatwe need to estimate the missing data.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sonia Goel ◽  
Meena Tushir

Purpose In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data information because of various privacy concerns on account of a user. This paper aims to deal with incomplete data for fuzzy model identification, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features. Design/methodology/approach In this work, authors proposed a three-fold approach for fuzzy model identification in which imputation-based linear interpolation technique is used to estimate missing features of the data, and then fuzzy c-means clustering is used for determining optimal number of rules and for the determination of parameters of membership functions of the fuzzy model. Finally, the optimization of the all antecedent and consequent parameters along with the width of the antecedent (Gaussian) membership function is done by gradient descent algorithm based on the minimization of root mean square error. Findings The proposed method is tested on two well-known simulation examples as well as on a real data set, and the performance is compared with some traditional methods. The result analysis and statistical analysis show that the proposed model has achieved a considerable improvement in accuracy in the presence of varying degree of data incompleteness. Originality/value The proposed method works well for fuzzy model identification method, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features with varying degree of missing data as compared to some well-known methods.


2021 ◽  
Vol 18 (1) ◽  
pp. 22-30
Author(s):  
Erna Nurmawati ◽  
Robby Hasan Pangaribuan ◽  
Ibnu Santoso

One way to deal with the presence of missing value or incomplete data is to impute the data using EM Algorithm. The need for large and fast data processing is necessary to implement parallel computing on EM algorithm serial program. In the parallel program architecture of EM Algorithm in this study, the controller is only related to the EM module whereas the EM module itself uses matrix and vector modules intensively. Parallelization is done by using OpenMP in EM modules which results in faster compute time on parallel programs than serial programs. Parallel computing with a thread of 4 (four) increases speed up, reduces compute time, and reduces efficiency when compared to parallel computing by the number of threads 2 (two).


Author(s):  
Pedro J. García-Laencina ◽  
Juan Morales-Sánchez ◽  
Rafael Verdú-Monedero ◽  
Jorge Larrey-Ruiz ◽  
José-Luis Sancho-Gómez ◽  
...  

Many real-word classification scenarios suffer a common drawback: missing, or incomplete, data. The ability of missing data handling has become a fundamental requirement for pattern classification because the absence of certain values for relevant data attributes can seriously affect the accuracy of classification results. This chapter focuses on incomplete pattern classification. The research works on this topic currently grows wider and it is well known how useful and efficient are most of the solutions based on machine learning. This chapter analyzes the most popular and proper missing data techniques based on machine learning for solving pattern classification tasks, trying to highlight their advantages and disadvantages.


Author(s):  
Hai Wang ◽  
Shouhong Wang

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003).


2021 ◽  
Vol 49 (2) ◽  
Author(s):  
Changxiao Cai ◽  
Gen Li ◽  
Yuejie Chi ◽  
H. Vincent Poor ◽  
Yuxin Chen

Econometrics ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 37 ◽  
Author(s):  
Golden ◽  
Henley ◽  
White ◽  
Kashner

Researchers are often faced with the challenge of developing statistical models with incomplete data. Exacerbating this situation is the possibility that either the researcher’s complete-data model or the model of the missing-data mechanism is misspecified. In this article, we create a formal theoretical framework for developing statistical models and detecting model misspecification in the presence of incomplete data where maximum likelihood estimates are obtained by maximizing the observable-data likelihood function when the missing-data mechanism is assumed ignorable. First, we provide sufficient regularity conditions on the researcher’s complete-data model to characterize the asymptotic behavior of maximum likelihood estimates in the simultaneous presence of both missing data and model misspecification. These results are then used to derive robust hypothesis testing methods for possibly misspecified models in the presence of Missing at Random (MAR) or Missing Not at Random (MNAR) missing data. Second, we introduce a method for the detection of model misspecification in missing data problems using recently developed Generalized Information Matrix Tests (GIMT). Third, we identify regularity conditions for the Missing Information Principle (MIP) to hold in the presence of model misspecification so as to provide useful computational covariance matrix estimation formulas. Fourth, we provide regularity conditions that ensure the observable-data expected negative log-likelihood function is convex in the presence of partially observable data when the amount of missingness is sufficiently small and the complete-data likelihood is convex. Fifth, we show that when the researcher has correctly specified a complete-data model with a convex negative likelihood function and an ignorable missing-data mechanism, then its strict local minimizer is the true parameter value for the complete-data model when the amount of missingness is sufficiently small. Our results thus provide new robust estimation, inference, and specification analysis methods for developing statistical models with incomplete data.


Author(s):  
Hai Wang ◽  
Shouhong Wang

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003). In mining a survey database with incomplete data, patterns of the missing data as well as the potential impacts of these missing data on the mining results constitute valuable knowledge. For instance, a data miner often wishes to know how reliable a data mining result is, if only the complete data entries are used; when and why certain types of values are often missing; what variables are correlated in terms of having missing values at the same time; what reason for incomplete data is likely, etc. These valuable pieces of knowledge can be discovered only after the missing part of the data set is fully explored.


2008 ◽  
pp. 3027-3032
Author(s):  
Hai Wang ◽  
Shouhong Wang

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003).


Sign in / Sign up

Export Citation Format

Share Document