scholarly journals Data Cleaning in Knowledge Discovery Database-Data Mining (KDD-DM)

Data quality is a main issue in quality information management. Data quality problems occur anywhere in information systems. These problems are solved by Data Cleaning (DC). DC is a process used to determine inaccurate, incomplete or unreasonable data and then improve the quality through correcting of detected errors and omissions. Various process of DC have been discussed in the previous studies, but there is no standard or formalized the DC process. The Domain Driven Data Mining (DDDM) is one of the KDD methodology often used for this purpose. This paper review and emphasize the important of DC in data preparation. The future works was also being highlight.

Author(s):  
Feyza Gürbüz ◽  
Fatma Gökçe Önen

The previous decades have witnessed major change within the Information Systems (IS) environment with a corresponding emphasis on the importance of specifying timely and accurate information strategies. Currently, there is an increasing interest in data mining and information systems optimization. Therefore, it makes data mining for optimization of information systems a new and growing research community. This chapter surveys the application of data mining to optimization of information systems. These systems have different data sources and accordingly different objectives for knowledge discovery. After the preprocessing stage, data mining techniques can be applied on the suitable data for the objective of the information systems. These techniques are prediction, classification, association rule mining, statistics and visualization, clustering and outlier detection.


2010 ◽  
Vol 25 (1) ◽  
pp. 49-67 ◽  
Author(s):  
Sumana Sharma ◽  
Kweku-Muata Osei-Bryson

AbstractThe knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.


Author(s):  
Pedro Fernandes Anunciação ◽  
Marina Rosa ◽  
Monique de Costa ◽  
Vanessa Oliveira

The evolution of management processes and the speed of the markets have highlighted the increasingly evident need for sharing of information and knowledge between the different economic agents. The competitiveness of economic organizations in a relational economic environment requires quality information. This feature is a critical success factor in the performance of economic activities. Organizations should seek to understand the internal and external dynamics inherent to the realization of their economic activities, identifying the various partners involved, and integrating their information systems, among others. The Volkswagen Autoeuropa is a reference to the management and economic organizations in which is evident the importance of information in the development of its activities with its partners and its centrality in the operation of the entire production chain. The objective of this study is to highlight the importance of logistics vision on the architecture of information systems, with reference to the case of Volkswagen Autoeuropa.


Author(s):  
Stephen Makau Mutua ◽  
Raphael Angulu

Over time, the adoption of ERP systems has been wide across many small, medium, and large organizations. An ERP system is supposed to inform the strategic decision making of the organization; therefore, the information drawn from the ERP system is as important as the data stored in it. Poor data quality affects the quality information in it. Data mining is used to discover trends and patterns of an organization. This chapter looks into the way of integrating these data mining into an ERP system. This is conceptualized in three crucial views namely the outer, inner, and the knowledge discovery view. The outer view comprises of the collection of various entry points, the inner view contains the data repository, and the knowledge discovery view offers the data mining component. Since the focus is data mining, the two strategies of supervised and unsupervised are discussed. The chapter then concludes by presenting the probable problems within which each of these two strategies (classification and clustering) can be put into place within the mining process of an ERP system.


2018 ◽  
Vol 7 (2.6) ◽  
pp. 93 ◽  
Author(s):  
Deepali R Vora ◽  
Kamatchi Iyer

Educational Data Mining (EDM) is a new field of research in the data mining and Knowledge Discovery in Databases (KDD) field. It mainly focuses in mining useful patterns and discovering useful knowledge from the educational information systems from schools, to colleges and universities. Analysing students’ data and information to perform various tasks like classification of students, or to create decision trees or association rules, so as to make better decisions or to enhance student’s performance is an interesting field of research. The paper presents a survey of various tasks performed in EDM and algorithms (methods) used for the same. The paper identifies the lacuna and challenges in Algorithms applied, Performance Factors considered and data used in EDM.


2018 ◽  
Vol 27 (47) ◽  
Author(s):  
Esther Marina Ruiz-Lobaina ◽  
Pedro Lázaro Romero-Suárez

This paper studies the patterns found using clustering and Self Organizing Maps (SOM), both techniques of Data Mining (DM), and after searching in databases, it compares them with the results retrieved from the libraries Information Management System (SGI). The methodology created for this study uses the results from both processes to improve the database information, which simultaneously increases the performance of the associated SGI search engine, and allows creating, with the gathered information, a new product that enrich the Selective Dissemination of Information (SDI).


2011 ◽  
pp. 277-299 ◽  
Author(s):  
Hongjiang Xu ◽  
Andy Koronius ◽  
Noel Brown

Information is the key resource of today’s organizations, and therefore, quality information is critical to organizations’ success. Accounting information systems (AIS) in particular, requires high quality information. This chapter discusses critical success factors for data quality in accounting information systems. A model for factors that impact on data quality in AIS was proposed, and then examined in seven Australian case studies. The detailed discussion of each factor was included, and it was found that education and training, nature of AIS, and top management commitment are the most critical factors. The findings of the study would help organizations to focus on important factors to obtain better benefit from less effort. Top management, IT and accounting professionals should be able to gain the better understanding of accounting information systems’ data quality management from the discussion of this chapter.


2019 ◽  
Vol 5 (2) ◽  
pp. 139
Author(s):  
Usman Ependi ◽  
Ade Putra

Dalam memprediksi persediaan barang banyak metode yang dapat dilakukan antara lain yaitu dengan melakukan pengolahan data penjualan menggunakan metode Data Mining yang disertai dengan algoritma apriori didasarkan pada proses pembelian yang dilakukan oleh konsumen berdasarkan keterkaitan antar produk yang dibeli. Dengan menggunakan algoritma apriori pihak perusahaan dalam hal ini adalah Regional Part Depo Auto 2000 Palembang dapat menyediakan spare part yang dibutuhkan oleh konsumen khususnya dilingkungan Sumatera Selatan tanpa harus melakukan proses indent hal ini dikarenakan banyaknya jumlah spare part yang harus di sediakan oleh PT. Depo Toyota guna melayani kebutuhan konsumen di lingkungan Sumatera Selatan. Adapun tahapan data mining yang di gunakan yaitu Knowledge Discovery in Database (KDD) yang terdiri dari proses data cleaning and integration, data selection and integration, data mining, evaluation and prentation. Dari proses diatas didapat pola keterkaitan spare part sebanyak 646 dari jumlah spare part sebanyak 338.


Author(s):  
Nishita Shewale

Abstract: To introduce unified information systems, this will provide different establishments with an insight on how data related activities take place and there results with assured quality. Considering data accumulation, replication, missing entities, incorrect formatting, anomalies etc. can come to light in the collection of data in different information systems, which can cause an array of adverse effects on data quality, the subject of data quality should be treated with better results. This paper inspects the data quality problems in information systems and introduces the new techniques that enable organizations to improve their quality of data. Keywords: Information Systems (IS), Data Quality, Data Cleaning, Data Profiling, Standardization, Database, Organization


2021 ◽  
Vol 9 (1) ◽  
pp. 46-61
Author(s):  
André Rosendorff ◽  
Alexander Hodes ◽  
Benjamin Fabian

Artificial Intelligence (AI) is becoming increasingly important in many industries due to its diverse areas of application and potential. In logistics in particular, increasing customer demands and the growth in shipment volumes are leading to difficulties in forecasting delivery times, especially for the last mile. This paper explores the potential of using AI to improve delivery forecasting. For this purpose, a structured theoretical solution approach and a method for improving delivery forecasting using AI are presented. In doing so, the important phases of the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework, a standard process for data mining, are adopted and discussed in detail to illustrate the complexity and importance of each task such as data preparation or evaluation. Subsequently, by embedding the described solution into an overall system architecture for information systems, ideas for the integration of the solution into the complexity of real information systems for logistics are given.


Sign in / Sign up

Export Citation Format

Share Document