Efficient attribute selection strategies for association rule mining in high dimensional data

Summary Because of computational advances in reservoir simulation with high-performance computing, it is now possible to simulate more than thousands of reservoir-simulation cases in a practical time frame. This opens a new avenue to reservoir-simulation studies, enabling exhaustive exploration of subsurface uncertainty and development/depletion options. However, analyzing the results of a large number of simulation cases still remains a challenging and overwhelming task. We propose a new method that enables the efficient analysis of massive reservoir-simulation results, often consisting of thousands of cases, by discovering interesting patterns of relationships among variables in large data sets. The method uses a well-known data-mining method, called association-rule mining, together with a high-dimensional visualization technique. We demonstrate the capability of the proposed method by using it to analyze the reservoir-simulation results from the Sensitivity Analysis of the Impact of Geological Uncertainty on Production (SAIGUP) project, which is an interdisciplinary reservoir-modeling project carried out earlier by Manzocchi et al. (2008a). To investigate the influence of geological features on oil recovery in shallow marine reservoirs, numerous reservoir models were built and flow-simulated in the SAIGUP project. In this paper, we analyze the simulation results from an ensemble of 9,072 models, which cover all possible combinations of several structural and sedimentological parameters individually varied to describe geological uncertainty. To be able to analyze the simulation results from such exhaustive sampling from high-dimensional model parameter space, it is crucial to efficiently decompose complex interactions between model parameters and to discover hidden impacts on flow response. By coupling the association-rule mining algorithm and high-dimensional visualization, such interactions and impacts are rapidly extracted and visualized in such a way that engineers and geoscientists can interpret meaningful sensitivities “at a glance.” This methodology provides a novel way to rapidly interpret flow response from a large ensemble of reservoir models without being overwhelmed by massive data.

Download Full-text

Using Association Rule Mining and High-Dimensional Visualization to Explore the Impact of Geological Features on Dynamic Flow Behavior

10.2118/174774-ms ◽

2015 ◽

Cited By ~ 1

Author(s):

Satomi Suzuki ◽

Dave Stern ◽

Tom Manzocchi

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Flow Behavior ◽

High Dimensional ◽

Dynamic Flow ◽

Rule Mining ◽

Geological Features ◽

The Impact

Download Full-text

Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets

Algorithms ◽

10.3390/a15010021 ◽

2022 ◽

Vol 15 (1) ◽

pp. 21

Author(s):

Consolata Gakii ◽

Paul O. Mireji ◽

Richard Rimiru

Keyword(s):

Feature Selection ◽

Association Rule ◽

Association Rule Mining ◽

Principal Component ◽

Recursive Feature Elimination ◽

High Dimensional ◽

Rule Mining ◽

Rnaseq Data ◽

Feature Selection Approach ◽

Feature Selection Techniques

Analysis of high-dimensional data, with more features () than observations () (), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.

Download Full-text

A Novel Market Basket Analysis Using Adaptive Association Rule Mining Algorithm

International Journal of Scientific Research ◽

10.15373/22778179/sep2012/9 ◽

2012 ◽

Vol 1 (4) ◽

pp. 25-28

Author(s):

M.Dhanabhakyam M.Dhanabhakyam ◽

◽

Dr.M.Punithavalli Dr.M.Punithavalli

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Market Basket Analysis ◽

Rule Mining ◽

Market Basket ◽

Mining Algorithm

Download Full-text

Study of Various Parallel Implementations of Association Rule Mining Algorithm

American Journal Of Advanced Computing ◽

10.15864/ajac.v2i1.94 ◽

2015 ◽

Vol 2 (1) ◽

Author(s):

Sarbani Dasgupta

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Rule Mining ◽

Mining Algorithm ◽

Parallel Implementations

Download Full-text

Prediksi Code Defect Perangkat Lunak Dengan Metode Association Rule Mining dan Cumulative Support Thresholds

Jurnal Buana Informatika ◽

10.24002/jbi.v6i2.408 ◽

2015 ◽

Vol 6 (2) ◽

Author(s):

Rizal Setya Perdana ◽

Umi Laili Yuhana

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Rule Mining ◽

Program Code

Kualitas perangkat lunak merupakan salah satu penelitian pada bidangrekayasa perangkat lunak yang memiliki peranan yang cukup besar dalamterbangunnya sistem perangkat lunak yang berkualitas baik. Prediksi defectperangkat lunak yang disebabkan karena terdapat penyimpangan dari prosesspesifikasi atau sesuatu yang mungkin menyebabkan kegagalan dalam operasionaltelah lebih dari 30 tahun menjadi topik riset penelitian. Makalah ini akandifokuskan pada prediksi defect yang terjadi pada kode program (code defect).Metode penanganan permasalahan defect pada kode program akan memanfaatkanpola-pola kode perangkat lunak yang berpotensi menimbulkan defect pada data setNASA untuk memprediksi defect. Metode yang digunakan dalam pencarian polaadalah memanfaatkan Association Rule Mining dengan Cumulative SupportThresholds yang secara otomatis menghasilkan nilai support dan nilai confidencepaling optimal tanpa membutuhkan masukan dari pengguna. Hasil pengujian darihasil pemrediksian defect kode perangkat lunak secara otomatis memiliki nilaiakurasi 82,35%.

Download Full-text