scholarly journals Novel Utility Procedure for Filtering High Associated Utility Items from Transactional Databases

In data mining, mining and analysis of data from different transactional data sources is an aggressive concept to explore optimal relations between different item sets. In recent years number of algorithms/methods was proposed to mine associated rule based item sets from transactional databases. Mining optimized high utility (like profit) association rule based item sets from transactional databases is still a challenging task in item set extraction in terms of execution time. We propose High Utility based Association Pattern Growth (HUAPG) approach to explore high association utility item sets from transactional data sets based on user item sets. User related item sets to mine associated items using utility data structure (UP-tree) with respect to identification of item sets in proposed approach. Proposed approach performance with compared to hybrid and existing methods worked on synthetic related data sets. Experimental results of proposed approach not only filter candidate item sets and also reduce the run time when database contain high amount of data transactions.

2015 ◽  
Author(s):  
William E. Hammond ◽  
Vivian L. West ◽  
David Borland ◽  
Igor Akushevich ◽  
Eugenia M. Heinz

Author(s):  
D. Amarsaikhan

Abstract. The aim of this research is to classify urban land cover types using an advanced classification method. As the input bands to the classification, the features derived from Landsat 8 and Sentinel 1A SAR data sets are used. To extract the reliable urban land cover information from the optical and SAR features, a rule-based classification algorithm that uses spatial thresholds defined from the contextual knowledge is constructed. The result of the constructed method is compared with the results of a standard classification technique and it indicates a higher accuracy. Overall, the study demonstrates that the multisource data sets can considerably improve the classification of urban land cover types and the rule-based method is a powerful tool to produce a reliable land cover map.


2008 ◽  
Vol 13 (3) ◽  
pp. 213-225 ◽  
Author(s):  
Albert Orriols-Puig ◽  
Ester Bernadó-Mansilla

2014 ◽  
Vol 10 (6) ◽  
pp. 2171-2199 ◽  
Author(s):  
R. J. H. Dunn ◽  
M. G. Donat ◽  
L. V. Alexander

Abstract. We assess the effects of different methodological choices made during the construction of gridded data sets of climate extremes, focusing primarily on HadEX2. Using global land-surface time series of the indices and their coverage, as well as uncertainty maps, we show that the choices which have the greatest effect are those relating to the station network used or that drastically change the values for individual grid boxes. The latter are most affected by the number of stations required in or around a grid box and the gridding method used. Most parametric changes have a small impact, on global and on grid box scales, whereas structural changes to the methods or input station networks may have large effects. On grid box scales, trends in temperature indices are very robust to most choices, especially in areas which have high station density (e.g. North America, Europe and Asia). The precipitation indices, being less spatially correlated, can be more susceptible to methodological choices, but coherent changes are still clear in regions of high station density. Regional trends from all indices derived from areas with few stations should be treated with care. On a global scale, the linear trends over 1951–2010 from almost all choices fall within the 5–95th percentile range of trends from HadEX2. This demonstrates the robust nature of HadEX2 and related data sets to choices in the creation method.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 100
Author(s):  
Ricardo A. Calix ◽  
Sumendra B. Singh ◽  
Tingyu Chen ◽  
Dingkai Zhang ◽  
Michael Tu

The cyber security toolkit, CyberSecTK, is a simple Python library for preprocessing and feature extraction of cyber-security-related data. As the digital universe expands, more and more data need to be processed using automated approaches. In recent years, cyber security professionals have seen opportunities to use machine learning approaches to help process and analyze their data. The challenge is that cyber security experts do not have necessary trainings to apply machine learning to their problems. The goal of this library is to help bridge this gap. In particular, we propose the development of a toolkit in Python that can process the most common types of cyber security data. This will help cyber experts to implement a basic machine learning pipeline from beginning to end. This proposed research work is our first attempt to achieve this goal. The proposed toolkit is a suite of program modules, data sets, and tutorials supporting research and teaching in cyber security and defense. An example of use cases is presented and discussed. Survey results of students using some of the modules in the library are also presented.


Author(s):  
Balazs Feil ◽  
Janos Abonyi

This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from simple data sets or huge databases can be represented by fuzzy rule-based expert systems. It is highlighted that both model performance and interpretability of the mined fuzzy models are of major importance, and effort is required to keep the resulting rule bases small and comprehensible. Therefore, in the previous years, soft computing based data mining algorithms have been developed for feature selection, feature extraction, model optimization, and model reduction (rule based simplification). Application of these techniques is illustrated using the wine data classification problem. The results illustrate that fuzzy tools can be applied in a synergistic manner through the nine steps of knowledge discovery.


Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 443 ◽  
Author(s):  
Lianmeng Jiao ◽  
Xiaojiao Geng ◽  
Quan Pan

The belief rule-based classification system (BRBCS) is a promising technique for addressing different types of uncertainty in complex classification problems, by introducing the belief function theory into the classical fuzzy rule-based classification system. However, in the BRBCS, high numbers of instances and features generally induce a belief rule base (BRB) with large size, which degrades the interpretability of the classification model for big data sets. In this paper, a BRB learning method based on the evidential C-means clustering (ECM) algorithm is proposed to efficiently design a compact belief rule-based classification system (CBRBCS). First, a supervised version of the ECM algorithm is designed by means of weighted product-space clustering to partition the training set with the goals of obtaining both good inter-cluster separability and inner-cluster pureness. Then, a systematic method is developed to construct belief rules based on the obtained credal partitions. Finally, an evidential partition entropy-based optimization procedure is designed to get a compact BRB with a better trade-off between accuracy and interpretability. The key benefit of the proposed CBRBCS is that it can provide a more interpretable classification model on the premise of comparative accuracy. Experiments based on synthetic and real data sets have been conducted to evaluate the classification accuracy and interpretability of the proposal.


2008 ◽  
Vol 159 (18) ◽  
pp. 2378-2398 ◽  
Author(s):  
Alberto Fernández ◽  
Salvador García ◽  
María José del Jesus ◽  
Francisco Herrera

Sign in / Sign up

Export Citation Format

Share Document