Exploiting data preparation to enhance mining and knowledge discovery

Author(s):  
B. Rajagopalan ◽  
M.W. Isken

Data quality is a main issue in quality information management. Data quality problems occur anywhere in information systems. These problems are solved by Data Cleaning (DC). DC is a process used to determine inaccurate, incomplete or unreasonable data and then improve the quality through correcting of detected errors and omissions. Various process of DC have been discussed in the previous studies, but there is no standard or formalized the DC process. The Domain Driven Data Mining (DDDM) is one of the KDD methodology often used for this purpose. This paper review and emphasize the important of DC in data preparation. The future works was also being highlight.


2010 ◽  
Vol 25 (1) ◽  
pp. 49-67 ◽  
Author(s):  
Sumana Sharma ◽  
Kweku-Muata Osei-Bryson

AbstractThe knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.


Sign in / Sign up

Export Citation Format

Share Document