scholarly journals Visualization Techniques for Data Mining

2008 ◽  
pp. 1623-1630
Author(s):  
Herna L. Viktor ◽  
Eric Paquet

The current explosion of data and information, mainly caused by data warehousing technologies as well as the extensive use of the Internet and its related technologies, has increased the urgent need for the development of techniques for intelligent data analysis. Data mining, which concerns the discovery and extraction of knowledge chunks from large data repositories, is aimed at addressing this need. Data mining automates the discovery of hidden patterns and relationships that may not always be obvious. Data mining tools include classification techniques (such as decision trees, rule induction programs and neural networks) (Han & Kamber, 2001), clustering algorithms and association rule approaches, amongst others.

Author(s):  
Herna L. Viktor ◽  
Eric Paquet

The current explosion of data and information, mainly caused by data warehousing technologies as well as the extensive use of the Internet and its related technologies, has increased the urgent need for the development of techniques for intelligent data analysis. Data mining, which concerns the discovery and extraction of knowledge chunks from large data repositories, is aimed at addressing this need. Data mining automates the discovery of hidden patterns and relationships that may not always be obvious. Data mining tools include classification techniques (such as decision trees, rule induction programs and neural networks) (Han & Kamber, 2001), clustering algorithms and association rule approaches, amongst others.


Author(s):  
Herna L. Viktor ◽  
Eric Paquet

The current explosion of data and information, which are mainly caused by the continuous adoption of data warehouses and the extensive use of the Internet and its related technologies, has increased the urgent need for the development of techniques for intelligent data analysis. Data mining, which concerns the discovery and extraction of knowledge chunks from large data repositories, addresses this need. Data mining automates the discovery of hidden patterns and relationships that may not always be obvious. Data mining tools include classification techniques (such as decision trees, rule induction programs and neural networks) (Kou et al., 2007); clustering algorithms and association rule approaches, amongst others. Data mining has been fruitfully used in many of domains, including marketing, medicine, finance, engineering and bioinformatics. There still are, however, a number of factors that militate against the widespread adoption and use of this new technology. This is mainly due to the fact that the results of many data mining techniques are often difficult to understand. For example, the results of a data mining effort producing 300 pages of rules will be difficult to analyze. The visual representation of the knowledge embedded in such rules will help to heighten the comprehensibility of the results. The visualization of the data itself, as well as the data mining process should go a long way towards increasing the user’s understanding of and faith in the data mining process. That is, data and information visualization provide users with the ability to obtain new insights into the knowledge, as discovered from large repositories. This paper describes a number of important visual data mining issues and introduces techniques employed to improve the understandability of the results of data mining. Firstly, the visualization of data prior to, and during, data mining is addressed. Through data visualization, the quality of the data can be assessed throughout the knowledge discovery process, which includes data preprocessing, data mining and reporting. We also discuss information visualization, i.e. how the knowledge, as discovered by a data mining tool, may be visualized throughout the data mining process. This aspect includes visualization of the results of data mining as well as the learning process. In addition, the paper shows how virtual reality and collaborative virtual environments may be used to obtain an immersive perspective of the data and the data mining process as well as how visual data mining can be used to directly mine functionality with specific applications in the emerging field of proteomics.


Author(s):  
Scott Nicholson ◽  
Jeffrey Stanton

Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a campus. These notions greatly oversimplify the world of libraries, however. Most large commercial organizations have dedicated in-house library operations, as do schools, non-governmental organizations, as well as local, state, and federal governments. With the increasing use of the Internet and the World Wide Web, digital libraries have burgeoned, and these serve a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data mining techniques—advanced statistical and visualization methods to locate non-trivial patterns in large data sets. Bibliomining refers to the use of these bibliometric and data mining techniques to explore the enormous quantities of data generated by the typical automated library.


Author(s):  
Mafruz Ashrafi ◽  
David Taniar ◽  
Kate Smith

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.


2017 ◽  
Vol 7 (1.3) ◽  
pp. 37
Author(s):  
Joy Christy A.

Data mining refers to the extraction of meaningful knowledge from large data sources as it may contain hidden potential facts. In general the analysis of data mining can either be predictive or descriptive. Predictive analysis of data mining interprets the inference of the existing results so as to identify the future outputs and the descriptive analysis of data mining interprets the intrinsic characteristics or nature of the data. Clustering is one of the descriptive analysis techniques of data mining which groups the objects of similar types in such a way that objects in a cluster are closer to each other than the objects of other clusters.  K-means is the most popular and widely used clustering algorithm that starts by selecting the k-random initial centroids as equal to number of clusters given by the user. It then computes the distance between initial centroids with the remaining data objects and groups the data objects into the cluster centroids with minimum distance. This process is repeated until there is no change in the cluster centroids or cluster members. But, still k-means has been suffered from several issues such as optimum number of k, random initial centroids, unknown number of iterations, global optimum solutions of clusters and more importantly the creation of meaningful clusters when dealing with the analysis of datasets from various domains. The accuracy involved with clustering should never be compromised. Thus, in this paper, a novel classification via clustering algorithm called Iterative Linear Regression Clustering with Percentage Split Distribution (ILRCPSD) is introduced as an alternate solution to the problems encountered in traditional clustering algorithms. The proposed algorithm is examined over an educational dataset to identify the hidden group of students having similar cognitive and competency skills.  The performance of the proposed algorithm is well-compared with the accuracy of the traditional k-means clustering in terms of building meaningful clusters and to prove its real time usefulness.


2008 ◽  
pp. 3694-3699
Author(s):  
William Perrizo ◽  
Qiang Ding ◽  
Masum Serazi ◽  
Taufik Abidin ◽  
Baoying Wang

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).


Author(s):  
Vasudha Bhatnagar ◽  
S. K. Gupta

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.


Author(s):  
Anisa Anisa ◽  
Mesran Mesran

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work


Author(s):  
D T Pham ◽  
A A Afify

Clustering is an important data exploration technique with many applications in different areas of engineering, including engineering design, manufacturing system design, quality assurance, production planning and process planning, modelling, monitoring, and control. The clustering problem has been addressed by researchers from many disciplines. However, efforts to perform effective and efficient clustering on large data sets only started in recent years with the emergence of data mining. The current paper presents an overview of clustering algorithms from a data mining perspective. Attention is paid to techniques of scaling up these algorithms to handle large data sets. The paper also describes a number of engineering applications to illustrate the potential of clustering algorithms as a tool for handling complex real-world problems.


2021 ◽  
Vol 8 (3) ◽  
pp. 65-70
Author(s):  
Mohamad Mohamad Shamie ◽  
Muhammad Mazen Almustafa

Data mining is a process of knowledge discovery to extract the interesting, previously unknown, potentially useful, and nontrivial patterns from large data sets. Currently, there is an increasing interest in data mining in traffic accidents, which makes it a growing new research community. A large number of traffic accidents in recent years have generated large amounts of traffic accident data. The mining algorithms had a great role in determining the causes of these accidents, especially the association rule algorithms. One challenging problem in data mining is effective association rules mining with the huge transactional databases, many efforts have been made to propose and improve association rules mining methods. In the paper, we use the RapidMiner application to design a process that can generate association rules based on clustering algorithms.


Sign in / Sign up

Export Citation Format

Share Document