Visualization Techniques for Data Mining

The current explosion of data and information, mainly caused by data warehousing technologies as well as the extensive use of the Internet and its related technologies, has increased the urgent need for the development of techniques for intelligent data analysis. Data mining, which concerns the discovery and extraction of knowledge chunks from large data repositories, is aimed at addressing this need. Data mining automates the discovery of hidden patterns and relationships that may not always be obvious. Data mining tools include classification techniques (such as decision trees, rule induction programs and neural networks) (Han & Kamber, 2001), clustering algorithms and association rule approaches, amongst others.

Download Full-text

Visualization Techniques for Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch224 ◽

2011 ◽

pp. 1190-1195 ◽

Cited By ~ 1

Author(s):

Herna L. Viktor ◽

Eric Paquet

Keyword(s):

Data Mining ◽

Clustering Algorithms ◽

Large Data ◽

Analysis Data ◽

Rule Induction ◽

Induction Programs ◽

Data Repositories ◽

Use Of The Internet ◽

Visualization Techniques ◽

Mining Tools

Download Full-text

Visual Data Mining from Visualization to Visual Information Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch314 ◽

2011 ◽

pp. 2056-2061 ◽

Cited By ~ 1

Author(s):

Herna L. Viktor ◽

Eric Paquet

Keyword(s):

Data Mining ◽

Information Visualization ◽

Visual Information ◽

Clustering Algorithms ◽

Large Data ◽

Induction Programs ◽

Visual Data ◽

Visual Data Mining ◽

Use Of The Internet ◽

Visualization Of Data

The current explosion of data and information, which are mainly caused by the continuous adoption of data warehouses and the extensive use of the Internet and its related technologies, has increased the urgent need for the development of techniques for intelligent data analysis. Data mining, which concerns the discovery and extraction of knowledge chunks from large data repositories, addresses this need. Data mining automates the discovery of hidden patterns and relationships that may not always be obvious. Data mining tools include classification techniques (such as decision trees, rule induction programs and neural networks) (Kou et al., 2007); clustering algorithms and association rule approaches, amongst others. Data mining has been fruitfully used in many of domains, including marketing, medicine, finance, engineering and bioinformatics. There still are, however, a number of factors that militate against the widespread adoption and use of this new technology. This is mainly due to the fact that the results of many data mining techniques are often difficult to understand. For example, the results of a data mining effort producing 300 pages of rules will be difficult to analyze. The visual representation of the knowledge embedded in such rules will help to heighten the comprehensibility of the results. The visualization of the data itself, as well as the data mining process should go a long way towards increasing the user’s understanding of and faith in the data mining process. That is, data and information visualization provide users with the ability to obtain new insights into the knowledge, as discovered from large repositories. This paper describes a number of important visual data mining issues and introduces techniques employed to improve the understandability of the results of data mining. Firstly, the visualization of data prior to, and during, data mining is addressed. Through data visualization, the quality of the data can be assessed throughout the knowledge discovery process, which includes data preprocessing, data mining and reporting. We also discuss information visualization, i.e. how the knowledge, as discovered by a data mining tool, may be visualized throughout the data mining process. This aspect includes visualization of the results of data mining as well as the learning process. In addition, the paper shows how virtual reality and collaborative virtual environments may be used to obtain an immersive perspective of the data and the data mining process as well as how visual data mining can be used to directly mine functionality with specific applications in the emerging field of proteomics.

Download Full-text

Bibliomining for Library Decision-Making

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch058 ◽

2011 ◽

pp. 341-345

Author(s):

Scott Nicholson ◽

Jeffrey Stanton

Keyword(s):

Data Mining ◽

Digital Libraries ◽

Large Data ◽

Data Sets ◽

Data Mining Techniques ◽

Governmental Organizations ◽

Non Governmental Organizations ◽

The People ◽

The World ◽

Use Of The Internet

Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a campus. These notions greatly oversimplify the world of libraries, however. Most large commercial organizations have dedicated in-house library operations, as do schools, non-governmental organizations, as well as local, state, and federal governments. With the increasing use of the Internet and the World Wide Web, digital libraries have burgeoned, and these serve a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data mining techniques—advanced statistical and visualization methods to locate non-trivial patterns in large data sets. Bibliomining refers to the use of these bibliometric and data mining techniques to explore the enormous quantities of data generated by the typical automated library.

Download Full-text

Towards Distributed Association Rule Mining Privacy

Application of Agents and Intelligent Information Technologies - Advances in Intelligent Information Technologies ◽

10.4018/978-1-59904-265-7.ch011 ◽

2011 ◽

pp. 245-271

Author(s):

Mafruz Ashrafi ◽

David Taniar ◽

Kate Smith

Keyword(s):

Data Mining ◽

Data Privacy ◽

Large Data ◽

Digital Data ◽

Sensitive Information ◽

Distributed Data ◽

Data Repositories ◽

Actionable Knowledge ◽

The Cost ◽

Network Technologies

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.

Download Full-text

An advanced ilrcpsd technique for bridging the competency and cognitive skills of students in higher education

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.3.8984 ◽

2017 ◽

Vol 7 (1.3) ◽

pp. 37

Author(s):

Joy Christy A.

Keyword(s):

Data Mining ◽

Cognitive Skills ◽

Clustering Algorithm ◽

Descriptive Analysis ◽

Clustering Algorithms ◽

Large Data ◽

Global Optimum ◽

Optimum Number ◽

Data Objects ◽

Alternate Solution

Data mining refers to the extraction of meaningful knowledge from large data sources as it may contain hidden potential facts. In general the analysis of data mining can either be predictive or descriptive. Predictive analysis of data mining interprets the inference of the existing results so as to identify the future outputs and the descriptive analysis of data mining interprets the intrinsic characteristics or nature of the data. Clustering is one of the descriptive analysis techniques of data mining which groups the objects of similar types in such a way that objects in a cluster are closer to each other than the objects of other clusters. K-means is the most popular and widely used clustering algorithm that starts by selecting the k-random initial centroids as equal to number of clusters given by the user. It then computes the distance between initial centroids with the remaining data objects and groups the data objects into the cluster centroids with minimum distance. This process is repeated until there is no change in the cluster centroids or cluster members. But, still k-means has been suffered from several issues such as optimum number of k, random initial centroids, unknown number of iterations, global optimum solutions of clusters and more importantly the creation of meaningful clusters when dealing with the analysis of datasets from various domains. The accuracy involved with clustering should never be compromised. Thus, in this paper, a novel classification via clustering algorithm called Iterative Linear Regression Clustering with Percentage Split Distribution (ILRCPSD) is introduced as an alternate solution to the problems encountered in traditional clustering algorithms. The proposed algorithm is examined over an educational dataset to identify the hidden group of students having similar cognitive and competency skills. The performance of the proposed algorithm is well-compared with the accuracy of the traditional k-means clustering in terms of building meaningful clusters and to prove its real time usefulness.

Download Full-text

Vertical Database Design for Scalable Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch232 ◽

2008 ◽

pp. 3694-3699

Author(s):

William Perrizo ◽

Qiang Ding ◽

Masum Serazi ◽

Taufik Abidin ◽

Baoying Wang

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Relational Database ◽

Large Data ◽

Database Design ◽

Database Systems ◽

Good Sense ◽

Data Repositories ◽

Relational Database Systems ◽

Scalable Data Mining

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).

Download Full-text

Modeling the KDD Process

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch207 ◽

2011 ◽

pp. 1337-1345 ◽

Cited By ~ 1

Author(s):

Vasudha Bhatnagar ◽

S. K. Gupta

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Continuous Process ◽

Large Data ◽

Research Field ◽

Knowledge Discovery In Databases ◽

Data Repositories ◽

Domain Experts ◽

Large Databases ◽

New Research

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.

Download Full-text

ANALISA POLA PEKERJAAN LULUSAN STMIK BUDI DARMA MENERAPKAN METODE C4.5

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.974 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Anisa Anisa ◽

Mesran Mesran

Keyword(s):

Data Mining ◽

Large Scale ◽

Large Data ◽

Analysis Data ◽

Mining Method ◽

Training Set ◽

Data Mining Method ◽

Large Scale Data ◽

Scale Data

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work

Download Full-text

Clustering techniques and their applications in engineering

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1243/09544062jmes508 ◽

2007 ◽

Vol 221 (11) ◽

pp. 1445-1459 ◽

Cited By ~ 19

Author(s):

D T Pham ◽

A A Afify

Keyword(s):

Data Mining ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Monitoring And Control ◽

Design Quality ◽

Clustering Problem ◽

Manufacturing System Design ◽

And Control

Clustering is an important data exploration technique with many applications in different areas of engineering, including engineering design, manufacturing system design, quality assurance, production planning and process planning, modelling, monitoring, and control. The clustering problem has been addressed by researchers from many disciplines. However, efforts to perform effective and efficient clustering on large data sets only started in recent years with the emergence of data mining. The current paper presents an overview of clustering algorithms from a data mining perspective. Attention is paid to techniques of scaling up these algorithms to handle large data sets. The paper also describes a number of engineering applications to illustrate the potential of clustering algorithms as a tool for handling complex real-world problems.

Download Full-text

Improving Association Rule Mining Using Clustering-Based Data Mining Model for Traffic Accidents

Review of Computer Engineer Studies ◽

10.18280/rces.080301 ◽

2021 ◽

Vol 8 (3) ◽

pp. 65-70

Author(s):

Mohamad Mohamad Shamie ◽

Muhammad Mazen Almustafa

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Traffic Accidents ◽

Clustering Algorithms ◽

Large Data ◽

Association Rules Mining ◽

Transactional Databases ◽

Accident Data ◽

New Research

Data mining is a process of knowledge discovery to extract the interesting, previously unknown, potentially useful, and nontrivial patterns from large data sets. Currently, there is an increasing interest in data mining in traffic accidents, which makes it a growing new research community. A large number of traffic accidents in recent years have generated large amounts of traffic accident data. The mining algorithms had a great role in determining the causes of these accidents, especially the association rule algorithms. One challenging problem in data mining is effective association rules mining with the huge transactional databases, many efforts have been made to propose and improve association rules mining methods. In the paper, we use the RapidMiner application to design a process that can generate association rules based on clustering algorithms.

Download Full-text