RDFRules: Making RDF rule mining easier and even more efficient

Semantic Web ◽  
2021 ◽  
pp. 1-34
Author(s):  
Václav Zeman ◽  
Tomáš Kliegr ◽  
Vojtěch Svátek

AMIE+ is a state-of-the-art algorithm for learning rules from RDF knowledge graphs (KGs). Based on association rule learning, AMIE+ constituted a breakthrough in terms of speed on large data compared to the previous generation of ILP-based systems. In this paper we present several algorithmic extensions to AMIE+, which make it faster, and the support for data pre-processing and model post-processing, which provides a more comprehensive coverage of the linked data mining process than does the original AMIE+ implementation. The main contributions are related to performance improvement: (1) the top-k approach, which addresses the problem of combinatorial explosion often resulting from a hand-set minimum support threshold, (2) a grammar that allows to define fine-grained patterns reducing the size of the search space, and (3) a faster projection binding reducing the number of repetitive calculations. Other enhancements include the possibility to mine across multiple graphs, the support for discretization of continuous values, and the selection of the most representative rules using proven rule pruning and clustering algorithms. Benchmarks show reductions in mining time of up to several orders of magnitude compared to AMIE+. An open-source implementation is available under the name RDFRules at https://github.com/propi/rdfrules.

2008 ◽  
Vol 17 (06) ◽  
pp. 1109-1129 ◽  
Author(s):  
BASILIS BOUTSINAS ◽  
COSTAS SIOTOS ◽  
ANTONIS GEROLIMATOS

One of the most important data mining problems is learning association rules of the form "90% of the customers that purchase product x also purchase product y". Discovering association rules from huge volumes of data requires substantial processing power. In this paper we present an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets. The proposed algorithm is based on partitioning the initial data set into subsets and processing each subset in parallel. The proposed algorithm can maintain the set of association rules that are extracted when applying an association rule mining algorithm to all the data, by reducing the support threshold during processing the subsets. The above are confirmed by empirical tests that we present and which also demonstrate the utility of the method.


2012 ◽  
Vol 217-219 ◽  
pp. 2381-2387
Author(s):  
Doru Romulus Pascu ◽  
Radu Alexandru Roşu ◽  
Iuliana Duma ◽  
Horia Daşcău

Non-alloyed P355NH steel according to EN 10028-3:2003 belongs to a group of fine-grained steels for pressure vessels being used in welded construction at decompression chamber for divers. Values of the chemical, structural and mechanical characteristics and steel toughness experimentally determined fit the analyzed steel in P355NH steel group according to EN 10028-3:2003. The toughness of the analyzed steel at the test temperature of -30°C is characterized by high values of fracture energy KV in longitudinal direction between 48 and 86 J and on transverse direction between 17 and 34J. Steel toughness at the test temperature of -30°C required by ABS standard (in Section 4/5.3 and Table 1) provides for breaking energy KV of min. 35J, with ductile fracture surfaces, value that is not respected at some lots of the three batches (A, B, C) of steel. Finally, based on the direct correlation established between HV10 hardness of the fine structure and the toughness it was made a selection of the lots of non-alloy steel P355NH which correspond to ABS norm for welded construction of the decompression chamber for divers


Author(s):  
Marlene Arangú ◽  
Miguel Salido

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems Constraint programming is a powerful software technology for solving numerous real-life problems. Many of these problems can be modeled as Constraint Satisfaction Problems (CSPs) and solved using constraint programming techniques. However, solving a CSP is NP-complete so filtering techniques to reduce the search space are still necessary. Arc-consistency algorithms are widely used to prune the search space. The concept of arc-consistency is bidirectional, i.e., it must be ensured in both directions of the constraint (direct and inverse constraints). Two of the most well-known and frequently used arc-consistency algorithms for filtering CSPs are AC3 and AC4. These algorithms repeatedly carry out revisions and require support checks for identifying and deleting all unsupported values from the domains. Nevertheless, many revisions are ineffective, i.e., they cannot delete any value and consume a lot of checks and time. In this paper, we present AC4-OP, an optimized version of AC4 that manages the binary and non-normalized constraints in only one direction, storing the inverse founded supports for their later evaluation. Thus, it reduces the propagation phase avoiding unnecessary or ineffective checking. The use of AC4-OP reduces the number of constraint checks by 50% while pruning the same search space as AC4. The evaluation section shows the improvement of AC4-OP over AC4, AC6 and AC7 in random and non-normalized instances.


2020 ◽  
Vol 1 (3) ◽  
pp. 1-7
Author(s):  
Sarbani Dasgupta ◽  
Banani Saha

In data mining, Apriori technique is generally used for frequent itemsets mining and association rule learning over transactional databases. The frequent itemsets generated by the Apriori technique provides association rules which are used for finding trends in the database. As the size of the database increases, sequential implementation of Apriori technique will take a lot of time and at one point of time the system may crash. To overcome this problem, several algorithms for parallel implementation of Apriori technique have been proposed. This paper gives a comparative study on various parallel implementation of Apriori technique .It also focuses on the advantages of using the Map Reduce technology, the latest technology used in parallelization of large dataset mining.


Author(s):  
Humera Farooq ◽  
Nordin Zakaria ◽  
Muhammad Tariq Siddique

The visualization of search space makes it easy to understand the behavior of the Genetic Algorithm (GA). The authors propose a novel way for representation of multidimensional search space of the GA using 2-D graph. This is carried out based on the gene values of the current generation, and human intervention is only required after several generations. The main contribution of this research is to propose an approach to visualize the GA search data and improve the searching process of the GA with human’s intention in different generations. Besides the selection of best individual or parents for the next generation, interference of human is required to propose a new individual in the search space. Active human intervention leads to a faster searching, resulting in less user fatigue. The experiments were carried out by evolving the parameters to derive the rules for a Parametric L-System. These rules are then used to model the growth process of branching structures in 3-D space. The experiments were conducted to evaluate the ability of the proposed approach to converge to optimized solution as compared to the Simple Genetic Algorithm (SGA).


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Abhishek Dixit ◽  
Akhilesh Tiwari ◽  
R. K. Gupta

The present paper proposes a new model for the exploration of hesitated patterns from multiple levels of conceptual hierarchy in the transactional dataset. The usual practice of mining patterns has focused on identifying frequent patterns (i.e., which occur together) in the transactional dataset but uncovers the vital information about the patterns which are almost frequent (but not exactly frequent) called “hesitated patterns.” The proposed model uses the reduced minimum support threshold (contains two values: attractiveness and hesitation) and constant minimum confidence threshold with the top-down progressive deepening approach for generating patterns and utilizing the apriori property. To validate the model, an online purchasing scenario of books through e-commerce-based online shopping platforms such as Amazon has been considered and shown that how the various factors contributed towards building hesitation to purchase a book at the time of purchasing. The present work suggests a novel way for deriving hesitated patterns from multiple levels in the conceptual hierarchy with respect to the target dataset. Moreover, it is observed that the concepts and theories available in the existing related work Lu and Ng (2007) are only focusing on the introductory aspect of vague set theory-based hesitation association rule mining, which is not useful for handling the patterns from multiple levels of granularity, while the proposed model is complete in nature and addresses the very significant and untouched problem of mining “multilevel hesitated patterns” and is certainly useful for exploring the hesitated patterns from multiple levels of granularity based on the considered hesitation status in a transactional dataset. These hesitated patterns can be further utilized by decision makers and business analysts to build the strategy on how to increase the attraction level of such hesitated items (appeared in a particular transaction/set of transactions in a given dataset) to convert their state from hesitated to preferred items.


2019 ◽  
Vol 5 ◽  
pp. e188 ◽  
Author(s):  
Hesam Hasanpour ◽  
Ramak Ghavamizadeh Meibodi ◽  
Keivan Navi

Classification and associative rule mining are two substantial areas in data mining. Some scientists attempt to integrate these two field called rule-based classifiers. Rule-based classifiers can play a very important role in applications such as fraud detection, medical diagnosis, etc. Numerous previous studies have shown that this type of classifier achieves a higher classification accuracy than traditional classification algorithms. However, they still suffer from a fundamental limitation. Many rule-based classifiers used various greedy techniques to prune the redundant rules that lead to missing some important rules. Another challenge that must be considered is related to the enormous set of mined rules that result in high processing overhead. The result of these approaches is that the final selected rules may not be the global best rules. These algorithms are not successful at exploiting search space effectively in order to select the best subset of candidate rules. We merged the Apriori algorithm, Harmony Search, and classification-based association rules (CBA) algorithm in order to build a rule-based classifier. We applied a modified version of the Apriori algorithm with multiple minimum support for extracting useful rules for each class in the dataset. Instead of using a large number of candidate rules, binary Harmony Search was utilized for selecting the best subset of rules that appropriate for building a classification model. We applied the proposed method on a seventeen benchmark dataset and compared its result with traditional association rule classification algorithms. The statistical results show that our proposed method outperformed other rule-based approaches.


2020 ◽  
Vol 8 (5) ◽  
pp. 2040-2044

The cloud technologies are gaining boom in the field of information technology. But on the same side cloud computing sometimes results in failures. These failures demand more reliable frameworks with high availability of computers acting as nodes. The request made by the user is replicated and sent to various VMs. If one of the VMs fail, the other can respond to increase the reliability. A lot of research has been done and being carried out to suggest various schemes for fault tolerance thus increasing the reliability. Earlier schemes focus on only one way of dealing with faults but the scheme proposed by the the author in this paper presents an adaptive scheme that deals with the issues related to fault tolerance in various cloud infrastructure. The projected scheme uses adaptive behavior during the selection of replication and fine-grained checkpointing methods for attaining a reliable cloud infrastructure that can handle different client requirements. In addition to it the algorithm also determines the best suited fault tolerance method for every designated virtual node. Zheng, Zhou,. Lyu and I. King (2012).


Sign in / Sign up

Export Citation Format

Share Document