support threshold
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 26)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
pp. 1-10
Author(s):  
Aamir Ali ◽  
Muhammad Asim

Generally, big interaction networks keep the interaction records of actors over a certain period. With the rapid increase of these networks users, the demand for frequent subgraph mining on a large database is more and more intense. However, most of the existing studies of frequent subgraphs have not considered the temporal information of the graph. To fill this research gap, this article presents a novel temporal frequent subgraph-based mining algorithm (TFSBMA) using spark. TFSBMA employs frequent subgraph mining with a minimum threshold in a spark environment. The proposed algorithm attempts to analyze the temporal frequent subgraph (TFS) using a Frequent Subgraph Mining Based Using Spark (FSMBUS) method with a minimum support threshold and evaluate its frequency in temporal manner. Furthermore, based on the FSMBUS results, the study also tries to compute TFS using an incremental update strategy. Experimental results show that the proposed algorithm can accurately and efficiently compute all the TFS with corresponding frequencies. In addition, we applied the proposed algorithm on a real-world dataset having artificial time information that confirms the practical usability of the proposed algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-23
Author(s):  
Yalun Zhang ◽  
Lin He ◽  
Guo Cheng

A fault diagnosis rule extraction method oriented to machine foot signal based on dynamic support threshold and association coefficient interestingness (DST-ACI) discriminant criterion is proposed in this paper. The new method includes three main innovations. First, the feature state coding method based on K-means clustering fully takes into account the imbalanced distribution of signal feature values due to the noise interference, and divide the signal feature values into several range intervals to generate the feature state code. Second, the frequent feature pattern mining method based on dynamic support threshold (DST) discriminant criterion can dynamically adjust support threshold according to the frequency of the feature states in each candidate pattern. Third, the fault diagnosis rule extraction method based on the association coefficient interestingness (ACI) discriminant criterion introduces a new metrics called ACI to evaluate the correlation between the pattern and the fault. Four types of fault simulation experiments were carried out, and the performance of the DST-ACI method was tested using the collected vibration signal. The results show that compared with the coding method based on equal-width discretization or equal-density discretization, the accuracy of the transactional dataset generated by the feature state coding method based on K-means clustering is higher. Compared with the frequent feature pattern mining method based on the constant support threshold criterion, the pattern mined by the DST-based criterion has generally higher support. Compared with the existing confidence-lift-based and confidence-improint-based fault diagnosis rule extraction frameworks, the positive correlation between the feature states and the fault type of the rules extracted based on the DST-ACI framework is generally stronger.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Jiawei Li

Aiming at the difficulty in setting the support threshold for sequential pattern mining algorithms and improving the effectiveness of the support threshold setting without the guidance of domain experts’ experience, an improved SPADE (sequential pattern discovery using equivalence classes) algorithm is proposed. By analyzing the relationship between the number of frequent sequences and the support threshold, the support threshold is dynamically selected. Using the electronic medical record data from a medical centre, the time-series relationship of the drugs taken by hypertension patients was extracted as the drug sequence dataset. By determining the optimal support threshold of the dataset, the frequent sequence set is mined, and the sequence rules are generated from the obtained regular sequences to visualize the sequence rules. The sequence rules of medication for hypertensive patients were combined with the patients’ physical indicators for the recommendation. For patients with obstetric hypertension, a combination of nifedipine and captopril is recommended. Through the comparison of the observation group and control group, we study the curative effect of various drugs. The results showed that the total effective rate of the observation group was about 96.6%; compared with the control group, the result indicated that the difference was significant ( P  < 0.05). The comparison of blood pressure levels between the two groups after treatment also showed that the results of the observation group were ideal ( P  < 0.05). In addition, the incidence of postpartum haemorrhage and perinatal complications in the observation group was also significantly reduced ( P  < 0.05). Therefore, the combination of medication for pregnancy hypertension syndrome can effectively improve the treatment effect of the disease and reduce the rate of postpartum haemorrhage and the incidence of perinatal complications.


2021 ◽  
Vol 48 (4) ◽  
Author(s):  
Hafiz I. Ahmad ◽  
◽  
Alex T. H. Sim ◽  
Roliana Ibrahim ◽  
Mohammad Abrar ◽  
...  

Association rule mining (ARM) is used for discovering frequent itemsets for interesting relationships of associative and correlative behaviors within the data. This gives new insights of great value, both commercial and academic. The traditional ARM techniques discover interesting association rules based on a predefined minimum support threshold. However, there is no known standard of an exact definition of minimum support and providing an inappropriate minimum support value may result in missing important rules. In addition, most of the rules discovered by these traditional ARM techniques refer to already known knowledge. To address these limitations of the minimum support threshold in ARM techniques, this study proposes an algorithm to mine interesting association rules without minimum support using predicate logic and a property of a proposed interestingness measure (g measure). The algorithm scans the database and uses g measure’s property to search for interesting combinations. The selected combinations are mapped to pseudo-implications and inference rules of logic are used on the pseudo-implications to produce and validate the predicate rules. Experimental results of the proposed technique show better performance against state-of-the-art classification techniques, and reliable predicate rules are discovered based on the reliability differences of the presence and absence of the rule’s consequence.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Abhishek Dixit ◽  
Akhilesh Tiwari ◽  
R. K. Gupta

The present paper proposes a new model for the exploration of hesitated patterns from multiple levels of conceptual hierarchy in the transactional dataset. The usual practice of mining patterns has focused on identifying frequent patterns (i.e., which occur together) in the transactional dataset but uncovers the vital information about the patterns which are almost frequent (but not exactly frequent) called “hesitated patterns.” The proposed model uses the reduced minimum support threshold (contains two values: attractiveness and hesitation) and constant minimum confidence threshold with the top-down progressive deepening approach for generating patterns and utilizing the apriori property. To validate the model, an online purchasing scenario of books through e-commerce-based online shopping platforms such as Amazon has been considered and shown that how the various factors contributed towards building hesitation to purchase a book at the time of purchasing. The present work suggests a novel way for deriving hesitated patterns from multiple levels in the conceptual hierarchy with respect to the target dataset. Moreover, it is observed that the concepts and theories available in the existing related work Lu and Ng (2007) are only focusing on the introductory aspect of vague set theory-based hesitation association rule mining, which is not useful for handling the patterns from multiple levels of granularity, while the proposed model is complete in nature and addresses the very significant and untouched problem of mining “multilevel hesitated patterns” and is certainly useful for exploring the hesitated patterns from multiple levels of granularity based on the considered hesitation status in a transactional dataset. These hesitated patterns can be further utilized by decision makers and business analysts to build the strategy on how to increase the attraction level of such hesitated items (appeared in a particular transaction/set of transactions in a given dataset) to convert their state from hesitated to preferred items.


Energies ◽  
2021 ◽  
Vol 14 (14) ◽  
pp. 4228
Author(s):  
Yan Xu ◽  
Mingyu Wang ◽  
Wen Fan

The fault data of the secondary system of smart substations hide some information that the association analysis algorithm can mine. The convergence speed of the Apriori algorithm and FP-growth algorithm is slow, and there is a lack of indicators to evaluate the correlation of association rules and the method to determine the parameter threshold. In this paper, the H-mine algorithm is used to realize the fast mining of fault data. The algorithm can traverse data faster by using the data structure of the H-struct. This paper also sets the lift and CF value to screen the association rules with good correlation. When setting the three key parameters of association analysis, namely, support threshold, confidence threshold, and lift threshold, an objective function composed of weighted average lift, CF value, and data coverage rate was selected, and the adaptive fireworks algorithm was used to optimize the parameters in the association analysis. In particular, the rule screening strategy is introduced in fault cause analysis in this paper. By eliminating rules with high similarity, derived signals in association rules are eliminated to the greatest extent to improve the readability of rules and ensure easy understanding of results.


Author(s):  
Ashley Newton

This study investigates how public charities respond to the public support test – an IRS requirement that at least one-third of a public charity’s financial support is derived from public sources.  Using a large sample of 836,920 charity-year observations during 2009-2018, I find that a disproportionately large number of charities exceed the 33⅓% public support threshold by a small margin.  This result holds only for public charities actually subject to the test (six years of age or older) and not young charities that automatically retain public charity status.  Further, I find that charities that unexpectedly just meet public support test are more likely to understate fundraising expenses.  This evidence implies that the public support levels of charities that just surpass the 33⅓% threshold are likely misrepresented.  Overall, my findings provide new insights into a vitally important regulatory threshold that has been largely neglected in existing research.


2021 ◽  
Vol 30 (04) ◽  
pp. 2150018
Author(s):  
Anindita Borah ◽  
Bhabesh Nath

Most pattern mining techniques almost singularly focus on identifying frequent patterns and very less attention has been paid to the generation of rare patterns. However, in several domains, recognizing less frequent but strongly related patterns have greater advantage over the former ones. Identification of compelling and meaningful rare associations among such patterns may proved to be significant for air quality management that has become an indispensable task in today’s world. The rare correlations between air pollutants and other parameters may aid in restricting the air pollution to a manageable level. To this end, efficient and competent rare pattern mining techniques are needed that can generate the complete set of rare patterns, further identifying significant rare association rules among them. Moreover, a notable issue with databases is their continuous update over time due to the addition of new records. The users requirement or behavior may change with the incremental update of databases that makes it difficult to determine a suitable support threshold for the extraction of interesting rare association rules. This paper, presents an efficient rare pattern mining technique to capture the complete set of rare patterns from a real environmental dataset. The proposed approach does not restart the entire mining process upon threshold update and generates the complete set of rare association rules in a single database scan. It can effectively perform incremental mining and also provides flexibility to the user to regulate the value of support threshold for generating the rare patterns. Significant rare association rules representing correlations between air pollutants and other environmental parameters are further extracted from the generated rare patterns to identify the substantial causes of air pollution. Performance analysis shows that the proposed method is more efficient than existing rare pattern mining approaches in providing significant directions to the domain experts for air pollution monitoring.


Author(s):  
Gourav Garg ◽  
Ashutosh Sharma* ◽  
Anshul Arora

Over the past few years, malware attacks have risen in huge numbers on the Android platform. Significant threats are posed by these attacks which may cause financial loss, information leakage, and damage to the system. Around 25 million smartphones were infected with malware within the first half of 2019 that depicts the seriousness of these attacks. Taking into account the danger posed by the Android malware to the users' community, we aim to develop a static Android malware detector named SFDroid that analyzes manifest file components for malware detection. In this work, first, the proposed model ranks the manifest features according to their frequency in normal and malicious apps. This helps us to identify the significant features present in normal and malware datasets. Additionally, we apply support thresholds to remove the unnecessary and redundant features from the rankings. Further, we propose a novel algorithm that uses the ranked features, and several machine learning classifiers to detect Android malware. The experimental results demonstrate that by using the Random Forest classifier at 10% support threshold, the proposed model gives a detection accuracy of 95.90% with 36 manifest components.


2021 ◽  
Author(s):  
Bin Wu ◽  
Yimin Mao ◽  
Deborah Simon Mwakapesa ◽  
Yaser Ahangari Nanehkaran ◽  
Qianhu Deng ◽  
...  

Abstract AR (Association rule) is considered to be one of the models for data mining. With the growth of datasets, conventional association rules are not suitable for big data mining, which has aroused a large number of scholars' interest in algorithm innovation. This study aims to design an optimization parallel association rules mining algorithm based on MapReduce, named as PMRARIM-IEG algorithm, to deal with problems such as the excessive space occupied by the CanTree (CanTreeCanonical order Tree), the inability to dynamically set the support threshold, and the time-consuming data transmission in the Map and Reduce phases. Firstly, a structure called SIM-IE (similar items merging based on information entropy) strategy is adopted for reducing the space occupation of the CanTree effectively. Then, a DST-GA (dynamic support threshold obtaining using genetic algorithm) is proposed to obtain the relatively optimal dynamic support threshold in the big data environment. Finally, in the process of MapReduce parallel, a LZO (Lempel-Ziv-Oberhumer) data compression strategy is used to compress the output data of the Map stage, which improves the speed of the data transmission. We compared the PMRARIM-IEG algorithm with other algorithms on five datasets, including Wikipedia , LiveJournal, com-amazon, kosarak, and webdocs. The experimental results obtained demonstrate that the proposed algorithm, PMRARIM-IEG, not only reduces the space and time complexity, but also obtains a well-performing speed-up ratio in a big data environment.


Sign in / Sign up

Export Citation Format

Share Document