Parallel tree-projection-based sequence mining algorithms

2004 ◽  
Vol 30 (4) ◽  
pp. 443-472 ◽  
Author(s):  
Valerie Guralnik ◽  
George Karypis

2003 ◽  
Author(s):  
Valerie Guralnik ◽  
George Karypis


Entropy ◽  
2018 ◽  
Vol 20 (12) ◽  
pp. 923 ◽  
Author(s):  
Wenbing Chang ◽  
Zhenzhong Xu ◽  
Meng You ◽  
Shenghan Zhou ◽  
Yiyong Xiao ◽  
...  

The purpose of this paper is to predict failures based on textual sequence data. The current failure prediction is mainly based on structured data. However, there are many unstructured data in aircraft maintenance. The failure mentioned here refers to failure types, such as transmitter failure and signal failure, which are classified by the clustering algorithm based on the failure text. For the failure text, this paper uses the natural language processing technology. Firstly, segmentation and the removal of stop words for Chinese failure text data is performed. The study applies the word2vec moving distance model to obtain the failure occurrence sequence for failure texts collected in a fixed period of time. According to the distance, a clustering algorithm is used to obtain a typical number of fault types. Secondly, the failure occurrence sequence is mined using sequence mining algorithms, such as-PrefixSpan. Finally, the above failure sequence is used to train the Bayesian failure network model. The final experimental results show that the Bayesian failure network has higher accuracy for failure prediction.



Author(s):  
Pinar Senkul ◽  
Nilufer Onder ◽  
Soner Onder ◽  
Engin Maden ◽  
Hui Meen Nyew

The goal of computer architecture research is to design and build high performance systems that make effective use of resources such as space and power. The design process typically involves a detailed simulation of the proposed architecture followed by corrections and improvements based on the simulation results. Both simulator development and result analysis are very challenging tasks due to the inherent complexity of the underlying systems. The motivation of this work is to apply episode mining algorithms to a new domain, architecture simulation, and to prepare an environment to make predictions about the performance of programs in different architectures. We describe our tool called Episode Mining Tool (EMT), which includes three temporal sequence mining algorithms, a preprocessor, and a visual analyzer. We present empirical analysis of the episode rules that were mined from datasets obtained by running detailed micro-architectural simulations.



2004 ◽  
Author(s):  
Minghua Zhang


Author(s):  
Feng Xiong ◽  
Hongzhi Wang

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.



2009 ◽  
Vol 50 ◽  
pp. 352-357
Author(s):  
Julija Pragarauskaitė ◽  
Gintautas Dzemyda

Dažnų posekių paieška didelėse duomenų bazėse yra svarbi biologinių, klimato, fi nansinių ir daugelio kitų duomenų bazių analizei. Tikslieji algoritmai, skirti dažnų posekių paieškai, daug kartų perrenka visą duomenų bazę. Jeigu duomenų bazė didelė, tai dažnų posekių paieška yra lėta arba reikalingi superkompiuteriai. Straipsnyje pasiūlytas naujas tikimybinis dažnų posekių paieškos algoritmas, kuris analizuoja tam tikru būdu sudarytą pradinės duomenų bazės atsitiktinę imtį. Remiantis šia analizedaromos statistinės išvados apie dažnus posekius pradinėje duomenų bazėje. Šis algoritmas nėra tikslus, tačiau veikia daug greičiau negu tikslieji algoritmai ir tinka žvalgomajai statistinei analizei. Tikimybinio algoritmo klaidų tikimybės įvertinamos statistiniais metodais. Tikimybinis algoritmas gali būti derinamas su tiksliaisiais dažnų posekių paieškos algoritmais. Jį galima taikyti ir bendrajam struktūrų paieškos uždaviniui.Probabilistic Algorithm for Mining Frequent SequencesJulija Pragarauskaitė, Gintautas Dzemyda SummaryFrequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, fi nancial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisions about the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods. The algorithm can be used together with the exact frequent sequence mining algorithms.



2021 ◽  
Author(s):  
Doruk Tiktiklar ◽  
Gursel Baltaoglu ◽  
Efsa Cakir ◽  
Zeynep Kucuk ◽  
Mehmet S. Aktas


2010 ◽  
Vol 51 ◽  
Author(s):  
Julija Pragarauskaitė ◽  
Gintautas Dzemyda

Frequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, financial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisionsabout the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods.



2019 ◽  
Vol 14 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Viswam Subeesh ◽  
Eswaran Maheswari ◽  
Hemendra Singh ◽  
Thomas Elsa Beulah ◽  
Ann Mary Swaroop

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.



Sign in / Sign up

Export Citation Format

Share Document