Near Candidate-Less Apriori with Tidlists and Other Apriori Implementations

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

In this study we implemented four different versions of Apriori, namely, basic and basic multi-threaded, bloom filter, trie, and count-min sketch, and proposed a new algorithm – NCLAT (Near Candidate-Less Apriori with Tidlists). We compared the runtimes and max memory usages of our implementations among each other as well as with the runtime of Borgelt’s Apriori implementation in some of the cases. NCLAT implementation is more efficient than the other Apriori implementations that we know of in terms of the number of times the database is scanned, and the number of candidates generated. Unlike the original Apriori algorithm which scans the database for every level and creates all of the candidates in advance for each level, NCLAT scans the database only once and creates candidate itemsets only for level one but not afterwards. Thus the number of candidates created is equal to the number of unique items in the database.

2014 ◽  
Vol 556-562 ◽  
pp. 1510-1514
Author(s):  
Li Qiang Lin ◽  
Hong Wen Yan

For the low efficiency in generating candidate item sets of apriori algorithm, this paper presents a method based on property division to improve generating candidate item sets. Comparing the improved apriori algorithm with the other algorithm and the improved algorithm is applied to the power system accident cases in extreme climate. The experiment results show that the improved algorithm significantly improves the time efficiency of generating candidate item sets. And it can find the association rules among time, space, disasters and fault facilities in the power system accident cases in extreme climate. That is very useful in power system fault analysis.


2021 ◽  
Vol 5 (3) ◽  
pp. 1107
Author(s):  
Siti Nurlela ◽  
Lilyani Asri Utami

The development of automotive industry in Indonesia can be classifiedas very rapid and annually increasing, causing highly competitive circumstances because many companies provide various types of motorcycle brands with quality and competitive prices. The company must create a marketing strategy pattern that can increase the level of sales efficiency of Yamaha motorcycle products. To overcome this problem, a strategy that can help increasing sales of motorcycle products is needed, in which by utilizing sales data owned by the company. Data mining can be used to process company sales data by looking for association rules with apriori algorithm on motorcycle product variables. From the results of the association rule analysis on sales data, with a minimum support of 30% and a minimum confidence of 75% can produce 3 rules with 3 products that are most in demand by consumers, namely the NEW MIOM3 CW, NEWAEROX155VVA and N-MAX, by knowing the most selling products, the company can add the most selling product supply and develop a marketing strategy to market the products with other products by examining the comparative advantage of the most sold products over the other products.


2014 ◽  
Vol 1044-1045 ◽  
pp. 846-849
Author(s):  
Liu Liu ◽  
Cheng Qian Ma

According to characteristics of monitoring system, firstly, making some preliminary processing for history database through the Apriori algorithm of association rules. So that digging out some useful data for the system, and concluding the initial sample data. Secondly, analyzing, training and processing the data using artificial neural network to make the monitoring system control strategy intelligent. On the other hand, the model can be used to forecast some useful data for monitoring system, to make management of tunnel more efficient.


2011 ◽  
Vol 2 (2) ◽  
Author(s):  
Denny Haryanto ◽  
Yetli Oslan ◽  
Djoni Dwiyana

Abstract. Implementation of Shopping Cart Analysis with Assosiation Rules using Apriori Algorithm on Motorcycle Spare Parts Sales. At a distributor agent, most sales transactions are recorded in one information system. Data recorded, are only used for administrative purposes. Whereas these data contain information that can be processed for the other purposes. One is to find a special relationship between the products purchased at the same time. Based on these relationships, it is possible to do promotional items with bond patterns of the products. Consumers who buy the products will be interested to buy other products commonly bought. If consumers do not buy the products that exist in the pattern of sales of products, the distributor can offer products that exist in the pattern of sales of products. One of the combinations of pattern discovery algorithms products is apriori algorithm. The use of association methods in the search bond patterns of the products for the promotion of a product, is to minimize the promotion of products that have a low level of sales. By minimizing the promotional items that are not purchased, consumers will not interfere with the promotional items that do not have bond pattern, so that the promotional item will be more effective. Keywords: Apriori Algorithm, Assosiation Rules, Sales Promotion, Sales Transacation, Bond Pattern Abstrak. Pada agen distributor, kebanyakan transaksi penjualan dicatat dalam satu sistem informasi. Data hasil pencatatan hanya digunakan untuk keperluan administrasi. Padahal data tersebut mengandung informasi yang dapat diproses untuk keperluan yang lebih luas. Salah satunya adalah untuk menemukan hubungan khusus antar produk yang dibeli bersamaan. Berdasarkan hubungan tersebut, dimungkinkan melakukan promosi barang dengan pola keterikatan barang tersebut. Konsumen yang membeli barang akan tertarik untuk membeli barang yang lain yang biasa dibelinya. Bila konsumen tidak membeli barang yang ada dalam pola penjualan barang, distributor dapat menawarkan barang yang ada dalam pola penjualan barang. Salah satu algoritma penemuan kombinasi pola barang adalah algoritma apriori. Penggunaan metode asosiasi dalam pencarian pola keterikatan untuk promosi produk, diharapkan dapat meminimalkan promosi barang yang mempunyai tingkat penjualan rendah. Dengan meminimalkan promosi barang yang tidak terbeli, konsumen tidak akan terganggu dengan promosi barang yang tidak mempunyai pola keterikatan, sehingga promosi akan lebih efektif. Kata Kunci: Algioritma Apriori, Aturan Asosiasi, Promosi Penjualan, Transaksi Penjualan, Pola Keterikatan


Association rule mining techniques are important part of data mining to derive relationship between attributes of large databases. Association related rule mining have evolved huge interest among researchers as many challenging problems can be solved using them. Numerous algorithms have been discovered for deriving association rules effectively. It has been evaluated that not all algorithms can give similar results in all scenarios, so decoding these merits becomes important. In this paper two association rule mining algorithms were analyzed, one is popular Apriori algorithm and the other is EARMGA (Evolutionary Association Rules Mining with Genetic Algorithm). Comparison of these two algorithms were experimentally performed based on different datasets and different parameters like Number of rules generated, Average support, Average Confidence, Covered records were detailed.


Author(s):  
Hiroshi Sakai ◽  
Kao-Yi Shen ◽  
Michinori Nakata ◽  
◽  
◽  
...  

This paper focuses on two Apriori-based rule generators. The first is the rule generator in Prolog and C, and the second is the one in SQL. They are namedApriori in PrologandApriori in SQL, respectively. Each rule generator is based on the Apriori algorithm. However, each rule generator has its own properties. Apriori in Prolog employs the equivalence classes defined by table data sets and follows the framework of rough sets. On the other hand, Apriori in SQL employs a search for rule generation and does not make use of equivalence classes. This paper clarifies the properties of these two rule generators and considers effective applications of each to existing data sets.


Author(s):  
Antony Stevens

ABSTRACT ObjectiveBloom Filters have been used in a number of studies conducted for the Ministry of Health. They are usually recommended because of the possibility that they may participate in secure protocols for the exchange of data. In our case the speed of the program, once the filters have been prepared, is so high that that itself is sufficient motive for their adoption. Nevertheless if two calendar dates differ by one character this may merit more attention than a similar difference in personal names. This became evident in a large linkage between mortality records and hospital separations where the patient had died. Higher scores were obtained when the date fields differed by only one character, but when that character represented a year there would no reason to notice the pair. When the character difference was compatible with a difference of a few days this would be more interesting because in studies like the one just cited it would be reasonable to admit differences of a few days or even, perhaps, weeks between the events ( recording of the death of the patient ).ApproachHow then to represent the difference between dates in a Bloom Filter? A date can be represented as a Boolean vector where the day (or week) is set to '1'. It may be represented by several contiguous '1's to admit admissible uncertainty in comparisons. The similarity between two dates can then just be the Dice Coefficient of the corresponding vectors. ResultBut a vector representing a date may then be very large. It could be as much as 365 bits per year, far more than is usually used for the other fields. The number of logical word comparisons would go up and the program would become slower. Knowing that the admissible range is presented by contiguous '1's means that we can obtain the effect of constructing the Bloom Filter and calculating the Dice Coefficient more directly. Starting with the two dates we can obtain the number of bits that are shared, which will depend on the admissible range. The Dice Coefficient can then be calculated directly without the need to construct the Filter. ConclusionWe are then left with the decision on how to add the result to the value obtained from the other variables, and this will depend on what importance it is felt the date should have.


2018 ◽  
Vol 3 (1) ◽  
pp. 89
Author(s):  
Rintho Rante Rerung

Dalam suatu bisnis diperlukan upaya memaksimalkan keuntungan diantaranya dengan melakukan promosi. Banyak cara yang bisa dilakukan untuk mempromosikan produk seperti dengan cara online dengan memanfaatkan media sosial Facebook dan situs-situs yang menyediakan iklan. Namun demikian, untuk memperoleh hasil yang maksimal maka perlu dilakukan perhitungan seberapa besar kemungkinan pelanggan akan tertarik terhadap produk yang ditawarkan. Penelitian ini bertujuan untuk menerapkan data mining untuk promosi produk Distro Nasional. Dalam bidang keilmuan data mining, terdapat suatu metode yang dinamakan association rule. Metode ini bertujuan untuk menunjukkan nilai asosiatif antara jenis-jenis produk yang dibeli oleh pelanggan sehingga terlihatlah suatu pola berupa produk apa saja yang sering dibeli oleh palanggan tersebut. Dengan mengetahui jenis produk yang sering dibeli maka dapat dibuat sebagai sebuah dasar keputusan untuk menentukan produk apa saja yang cocok untuk dipromosikan kepada pelanggan tersebut. Algoritma Apriori juga akan dipergunakan untuk menentukan frequent itemset sehingga hasil akhir yang dicapai yaitu untuk menghitung persentase ketertarikan (confindence) pelanggan terhadap produk yang ditawarkan.Kata kunci: promosi, data mining, association rule, produk In a business, there is required efforts to maximize profits include by promotion. Many ways can be conducted in promoting a product such as by using Facebook as an online social media and sites which provide advertisements. On the other hand, in gaining a maximum result is required a calculation about how big customer probability to get interested in a product offered. This study aims to apply data mining for product promotion of Distro Nasional store. In a science of data mining there is a method called association rule. This method was intended to indicate associative values among product types were bought by customers. So that, it can be seen a pattern which types of product that often bought by customers. By knowing that information it can be made as a decission base to determine which appropriate products get promotted to that customer. Apriori algorithm will also be used to determine the frequent itemset so that the final result achieved is to calculate the percentage of customer interest (confindence) on the product offered.Keywords: promotion, data mining, association rule, product 


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Sufang Zhou ◽  
Shundong Li ◽  
Jiawei Dou ◽  
Yaling Geng ◽  
Xin Liu

Secure subset problem is important in secure multiparty computation, which is a vital field in cryptography. Most of the existing protocols for this problem can only keep the elements of one set private, while leaking the elements of the other set. In other words, they cannot solve the secure subset problem perfectly. While a few studies have addressed actual secure subsets, these protocols were mainly based on the oblivious polynomial evaluations with inefficient computation. In this study, we first design an efficient secure subset protocol for sets whose elements are drawn from a known set based on a new encoding method and homomorphic encryption scheme. If the elements of the sets are taken from a large domain, the existing protocol is inefficient. Using the Bloom filter and homomorphic encryption scheme, we further present an efficient protocol with linear computational complexity in the cardinality of the large set, and this is considered to be practical for inputs consisting of a large number of data. However, the second protocol that we design may yield a false positive. This probability can be rapidly decreased by reexecuting the protocol with different hash functions. Furthermore, we present the experimental performance analyses of these protocols.


2014 ◽  
Author(s):  
Li Song ◽  
Liliana Florea ◽  
Ben Langmead

Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.


Sign in / Sign up

Export Citation Format

Share Document