Data Mining based Handling Missing Data

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003).

Download Full-text

Data Mining with Incomplete Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch082 ◽

2011 ◽

pp. 526-530 ◽

Cited By ~ 1

Author(s):

Hai Wang ◽

Shouhong Wang

Keyword(s):

Data Mining ◽

Missing Data ◽

Incomplete Data ◽

Missing Values ◽

Knowledge Engineering ◽

Data Set ◽

Survey Question ◽

Data Source ◽

The Common ◽

Extent Of Damage

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003). In mining a survey database with incomplete data, patterns of the missing data as well as the potential impacts of these missing data on the mining results constitute valuable knowledge. For instance, a data miner often wishes to know how reliable a data mining result is, if only the complete data entries are used; when and why certain types of values are often missing; what variables are correlated in terms of having missing values at the same time; what reason for incomplete data is likely, etc. These valuable pieces of knowledge can be discovered only after the missing part of the data set is fully explored.

Download Full-text

Data Mining with Incomplete Data

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch193 ◽

2008 ◽

pp. 3027-3032

Author(s):

Hai Wang ◽

Shouhong Wang

Keyword(s):

Data Mining ◽

Missing Data ◽

Survey Data ◽

Incomplete Data ◽

Knowledge Engineering ◽

Data Set ◽

Survey Question ◽

Data Source ◽

The Common ◽

Extent Of Damage

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003).

Download Full-text

Imprecise Data and the Data Mining Process

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch155 ◽

2011 ◽

pp. 999-1005

Author(s):

Marvin L. Brown ◽

John F. Kros

Keyword(s):

Data Mining ◽

Missing Data ◽

Enterprise Resource Planning ◽

Large Scale ◽

Resource Planning ◽

Future Trends ◽

Imprecise Data ◽

Discovery Process ◽

Planning Systems ◽

Enterprise Resource Planning Systems

Missing or inconsistent data has been a pervasive problem in data analysis since the origin of data collection. The management of missing data in organizations has recently been addressed as more firms implement large-scale enterprise resource planning systems (see Vosburg & Kumar, 2001; Xu et al., 2002). The issue of missing data becomes an even more pervasive dilemma in the knowledge discovery process, in that as more data is collected, the higher the likelihood of missing data becomes. The objective of this research is to discuss imprecise data and the data mining process. The article begins with a background analysis, including a brief review of both seminal and current literature. The main thrust of the chapter focuses on reasons for data inconsistency along with definitions of various types of missing data. Future trends followed by concluding remarks complete the chapter.

Download Full-text