PixelMaps: a new visual data mining approach for analyzing large spatial data sets

The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

Download Full-text

Neural Network-Based Visual Data Mining for Cancer Data

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch176 ◽

2011 ◽

pp. 1205-1211

Author(s):

Enrique Romero ◽

Julio J. Valdés ◽

Alan J. Barton

Keyword(s):

Gene Expression ◽

Data Mining ◽

Neural Networks ◽

Scientific Discovery ◽

Data Sets ◽

Visual Data ◽

Visual Data Mining ◽

Cancer Data ◽

The World ◽

Very High

According to the World Health Organization (http:// www.who.int/cancer/en), cancer is a leading cause of death worldwide. From a total of 58 million deaths in 2005, cancer accounts for 7.6 million (or 13%) of all deaths. The main types of cancer leading to overall cancer mortality are i) Lung (1.3 million deaths/year), ii) Stomach (almost 1 million deaths/year), iii) Liver (662,000 deaths/year), iv) Colon (655,000 deaths/year) and v) Breast (502,000 deaths/year). Among men the most frequent cancer types worldwide are (in order of number of global deaths): lung, stomach, liver, colorectal, oesophagus and prostate, while among women (in order of number of global deaths) they are: breast, lung, stomach, colorectal and cervical. Technological advancements in recent years are enabling the collection of large amounts of cancer related data. In particular, in the field of Bioinformatics, high-throughput microarray gene experiments are possible, leading to an information explosion. This requires the development of data mining procedures that speed up the process of scientific discovery, and the in-depth understanding of the internal structure of the data. This is crucial for the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad, Piatesky-Shapiro & Smyth, 1996). Researchers need to understand their data rapidly and with greater ease. In general, objects under study are described in terms of collections of heterogeneous properties. It is typical for medical data to be composed of properties represented by nominal, ordinal or real-valued variables (scalar), as well as by others of a more complex nature, like images, time-series, etc. In addition, the information comes with different degrees of precision, uncertainty and information completeness (missing data is quite common). Classical data mining and analysis methods are sometimes difficult to use, the output of many procedures may be large and time consuming to analyze, and often their interpretation requires special expertise. Moreover, some methods are based on assumptions about the data which limit their application, specially for the purpose of exploration, comparison, hypothesis formation, etc, typical of the first stages of scientific investigation. This makes graphical representation directly appealing. Humans perceive most of the information through vision, in large quantities and at very high input rates. The human brain is extremely well qualified for the fast understanding of complex visual patterns, and still outperforms the computer. Several reasons make Virtual Reality (VR) a suitable paradigm: i) it is flexible (it allows the choice of different representation models to better suit human perception preferences), ii) allows immersion (the user can navigate inside the data, and interact with the objects in the world), iii) creates a living experience (the user is not merely a passive observer, but an actor in the world) and iv) VR is broad and deep (the user may see the VR world as a whole, and/or concentrate on specific details of the world). Of no less importance is the fact that in order to interact with a virtual world, only minimal skills are required. Visualization techniques may be very useful for medical decisión support in the oncology area. In this paper unsupervised neural networks are used for constructing VR spaces for visual data mining of gene expression cancer data. Three datasets are used in the paper, representative of three of the most importanttypes of cancer in modern medicine: liver, stomach and lung. The data sets are composed of samples from normal and tumor tissues, described in terms of tens of thousands of variables, which are the corresponding gene expression intensities measured in microarray experiments. Despite the very high dimensionality of the studied patterns, high quality visual representations in the form of structure-preserving VR spaces are obtained using SAMANN neural networks, which enables the differentiation of cancerous and noncancerous tissues. The same networks could be used as nonlinear feature generators in a preprocessing step for other data mining procedures.

Download Full-text

A visual data-mining methodology for seismic facies analysis: Part 2 — Application to 3D seismic data

Geophysics ◽

10.1190/1.3046456 ◽

2009 ◽

Vol 74 (1) ◽

pp. P13-P23 ◽

Cited By ~ 20

Author(s):

Iván Dimitri Marroquín ◽

Jean-Jules Brault ◽

Bruce S. Hart

Keyword(s):

Data Mining ◽

Seismic Data ◽

Clustering Analysis ◽

Facies Analysis ◽

Seismic Facies ◽

Visual Data ◽

Visual Data Mining ◽

Data Mining Approach ◽

Seismic Facies Analysis ◽

3D Seismic Data

A visual data-mining approach to unsupervised clustering analysis can be an effective tool for visualizing and understanding patterns inherent in seismic data (i.e., seismic facies). The unsupervised clustering analysis is completely data-driven, requiring no external information (e.g., well logs) to guide the seismic-trace classification. We demonstrate the application of the visual data-mining approach to seismic facies analysis on a real 3D seismic data volume. We select two stratigraphic intervals, the first including a Devonian pinnacle reef system and the second containing a Jurassic siliciclastic channel system. Both analyses show major stratigraphic features that can be defined in horizon slices or other types of visualization. However, the visual data-mining approach creates seismic facies maps with improved visual detail, distinguishing seismic trace-shape variability in the data. We also compare the facies maps with those obtained from a commercial package for seismic facies classification. Both approaches created similar facies maps, but the visual strategy better depicts subtle stratigraphic changes in the bodies being imaged, offering insight into the nature of these features.

Download Full-text