scholarly journals IPOC: an efficient approach for dynamic association rule generation using incremental data with updating supports

Author(s):  
P. Naresh ◽  
R. Suguna

According to recent statistics, there was drastic growth in online business sector where more number of customers intends to purchase items. Due to these retailers accumulates huge volumes of data from day to day operations and engrossed in analyzing the data to watch the behavior of customers at items which strengthen the business promotions and catalog management. It reveals the customer interestingness and frequent items from large data. To carry out this there was known algorithms present which deals with static and dynamic data. Some of them are lag time and memory consuming and involves unnecessary process. This paper intents to implement an efficient incremental pre ordered coded tree (IPOC) generation for data updates and applies frequent item set generation algorithm on the tree. While incremental generation of tree, new data items will link to previous nodes in tree by increasing its support count. This removes the lagging issues in existing algorithms and does not need to mine from scratch and also reduces the time, memory consumption by the use of nodeset data structure. The results of proposed method was observed and analyzed with existing methods. The anticipated method shows improved results by means of generated items, time and memory.

2013 ◽  
Vol 23 (03) ◽  
pp. 1350012 ◽  
Author(s):  
FADI THABTAH ◽  
SUHEL HAMMOUD

Association rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have been developed and a number of experiments against small, medium and large data collections have been conducted. The ground bases of the comparisons are time required by the algorithm for: data initialisation, frequent items discovery, rule generation, etc. The results show that MR-ARM is very useful tool for mining association rules from large datasets in a distributed environment.


2017 ◽  
Vol 7 (1.5) ◽  
pp. 51
Author(s):  
M. Sireesha ◽  
Srikanth Vemuru ◽  
S. N. TirumalaRao

Frequent item set mining and association rule mining is the key tasks in knowledge discovery process. Various customized algorithms are being implemented in Association Rule Mining process to find the set of frequent patterns. Though we have many algorithms apriori is one of the standard algorithm for finding frequent itemsets, but this algorithm is inefficient because of several scans of database and more number of candidates to be generated. To overcome these limitations, in this paper a new algorithm called Coalesce based Binary Table is introduced. Through this algorithm the given database is scanned only once to generate Binary Table by which frequent-1 itemsets are found.  To progress the process, infrequent-1 itemsets are identified and removed from the Binary Table to rearrange the items in support ascending order. To each frequent-1 itemset find Coalesce matrix and Index List to generate all frequent itemsets having the same support count as representative items and the remaining frequent itemsets are obtained in depth first manner. The significant benefits with the proposed method are the whole database is scanned only once, no need to generate and check each candidate to find the set of frequent items. On the other hand frequent items having the same support counts as representative items can be identified directly by joining the representative item with all the combinations of Coalesce matrix. So, it is proven that coalesce based Binary Table is panacea to cut short the time in identifying the frequent itemsets hence the efficiency is improved.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lei Li ◽  
Desheng Wu

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.


Author(s):  
Amin A. Abdulghani

A lot of interest has been expressed in database mining using association rules (Agrawal, Imielinski, & Swami, 1993). In this chapter, we provide a different view of the association rules, referred to as cubegrades (Imielinski, Khachiyan, & Abdulghani, 2002) . An example of a typical association rule states that, say, 23% of supermarket transactions (so called market basket data) which buy bread and butter buy also cereal (that percentage is called confidence) and that 10% of all transactions buy bread and butter (this is called support). Bread and butter represent the body of the rule and cereal constitutes the consequent of the rule. This statement is typically represented as a probabilistic rule. But association rules can also be viewed as statements about how the cell representing the body of the rule is affected by specializing it by adding an extra constraint expressed by the rule’s consequent. Indeed, the confidence of an association rule can be viewed as the ratio of the support drop, when the cell corresponding to the body of a rule (in our case the cell of transactions buying bread and butter) is augmented with its consequent (in this case cereal). This interpretation gives association rules a “dynamic flavor” reflected in a hypothetical change of support affected by specializing the body cell to a cell whose description is a union of body and consequent descriptors. For example, our earlier association rule can be interpreted as saying that the count of transactions buying bread and butter drops to 23% of the original when restricted (rolled down) to the transactions buying bread, butter and cereal. In other words, this rule states how the count of transactions supporting buyers of bread and butter is affected by buying cereal as well. With such interpretation in mind, a much more general view of association rules can be taken, when support (count) can be replaced by an arbitrary measure or aggregate and the specialization operation can be substituted with a different “delta” operation. Cubegrades capture this generalization. Conceptually, this is very similar to the notion of gradients used in calculus. By definition the gradient of a function between the domain points x1 and x2 measures the ratio of the delta change in the function value over the delta change between the points. For a given point x and function f(), it can be interpreted as a statement of how a change in the value of x (?x), affects a change of value in the function (? f(x)).


2008 ◽  
pp. 2105-2120
Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


Author(s):  
Reshu Agarwal

A modified framework that applies temporal association rule mining to inventory management is proposed in this article. The ordering policy of frequent items is determined and inventory is classified based on loss rule. This helps inventory managers to determine optimum order quantity of frequent items together with the most profitable item in each time-span. An example is illustrated to validate the results.


Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


Sign in / Sign up

Export Citation Format

Share Document