Parallel tree-projection-based sequence mining algorithms

Valerie Guralnik; George Karypis

doi:10.1016/j.parco.2004.03.003

Parallel Formulations of Tree-Projection Based Sequence Mining Algorithms

10.21236/ada439587 ◽

2003 ◽

Author(s):

Valerie Guralnik ◽

George Karypis

Keyword(s):

Sequence Mining ◽

Mining Algorithms

A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering

Entropy ◽

10.3390/e20120923 ◽

2018 ◽

Vol 20 (12) ◽

pp. 923 ◽

Cited By ~ 4

Author(s):

Wenbing Chang ◽

Zhenzhong Xu ◽

Meng You ◽

Shenghan Zhou ◽

Yiyong Xiao ◽

...

Keyword(s):

Language Processing ◽

Clustering Algorithm ◽

Sequence Data ◽

Failure Prediction ◽

Structured Data ◽

Sequence Mining ◽

Text Data ◽

Distance Model ◽

Mining Algorithms ◽

Fixed Period

The purpose of this paper is to predict failures based on textual sequence data. The current failure prediction is mainly based on structured data. However, there are many unstructured data in aircraft maintenance. The failure mentioned here refers to failure types, such as transmitter failure and signal failure, which are classified by the clustering algorithm based on the failure text. For the failure text, this paper uses the natural language processing technology. Firstly, segmentation and the removal of stop words for Chinese failure text data is performed. The study applies the word2vec moving distance model to obtain the failure occurrence sequence for failure texts collected in a fixed period of time. According to the distance, a clustering algorithm is used to obtain a typical number of fault types. Secondly, the failure occurrence sequence is mined using sequence mining algorithms, such as-PrefixSpan. Finally, the above failure sequence is used to train the Bayesian failure network model. The final experimental results show that the Bayesian failure network has higher accuracy for failure prediction.

Discovering Patterns for Architecture Simulation by Using Sequence Mining

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch013 ◽

2012 ◽

pp. 212-236

Author(s):

Pinar Senkul ◽

Nilufer Onder ◽

Soner Onder ◽

Engin Maden ◽

Hui Meen Nyew

Keyword(s):

Computer Architecture ◽

High Performance ◽

Domain Architecture ◽

Sequence Mining ◽

Episode Mining ◽

Detailed Simulation ◽

Architecture Simulation ◽

Result Analysis ◽

Effective Use ◽

Mining Algorithms

The goal of computer architecture research is to design and build high performance systems that make effective use of resources such as space and power. The design process typically involves a detailed simulation of the proposed architecture followed by corrections and improvements based on the simulation results. Both simulator development and result analysis are very challenging tasks due to the inherent complexity of the underlying systems. The motivation of this work is to apply episode mining algorithms to a new domain, architecture simulation, and to prepare an environment to make predictions about the performance of programs in different architectures. We describe our tool called Episode Mining Tool (EMT), which includes three temporal sequence mining algorithms, a preprocessor, and a visual analyzer. We present empirical analysis of the episode rules that were mined from datasets obtained by running detailed micro-architectural simulations.

Sequence mining algorithms

10.5353/th_b4457011 ◽

2004 ◽

Author(s):

Minghua Zhang

Keyword(s):

Sequence Mining ◽

Mining Algorithms

Mining Simple Path Traversal Patterns in Knowledge Graph

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2128 ◽

2022 ◽

Author(s):

Feng Xiong ◽

Hongzhi Wang

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Divide And Conquer ◽

Simple Path ◽

Knowledge Graph ◽

Sequence Mining ◽

High Coverage ◽

List Structure ◽

Life Force ◽

Mining Algorithms

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.

Tikimybinis dažnų posekių paieškos algoritmas

Informacijos mokslai ◽

10.15388/im.2009.0.3211 ◽

2009 ◽

Vol 50 ◽

pp. 352-357

Author(s):

Julija Pragarauskaitė ◽

Gintautas Dzemyda

Keyword(s):

Random Sample ◽

Statistical Methods ◽

Large Volume ◽

Sequence Mining ◽

Sample Analysis ◽

Probabilistic Algorithm ◽

Frequent Sequence ◽

Mining Algorithms

Dažnų posekių paieška didelėse duomenų bazėse yra svarbi biologinių, klimato, fi nansinių ir daugelio kitų duomenų bazių analizei. Tikslieji algoritmai, skirti dažnų posekių paieškai, daug kartų perrenka visą duomenų bazę. Jeigu duomenų bazė didelė, tai dažnų posekių paieška yra lėta arba reikalingi superkompiuteriai. Straipsnyje pasiūlytas naujas tikimybinis dažnų posekių paieškos algoritmas, kuris analizuoja tam tikru būdu sudarytą pradinės duomenų bazės atsitiktinę imtį. Remiantis šia analizedaromos statistinės išvados apie dažnus posekius pradinėje duomenų bazėje. Šis algoritmas nėra tikslus, tačiau veikia daug greičiau negu tikslieji algoritmai ir tinka žvalgomajai statistinei analizei. Tikimybinio algoritmo klaidų tikimybės įvertinamos statistiniais metodais. Tikimybinis algoritmas gali būti derinamas su tiksliaisiais dažnų posekių paieškos algoritmais. Jį galima taikyti ir bendrajam struktūrų paieškos uždaviniui.Probabilistic Algorithm for Mining Frequent SequencesJulija Pragarauskaitė, Gintautas Dzemyda SummaryFrequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, fi nancial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisions about the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods. The algorithm can be used together with the exact frequent sequence mining algorithms.

On the Comparative Analysis of Sequence Mining Algorithms: Case Study in Telecommunications

10.1109/ubmk52708.2021.9558935 ◽

2021 ◽

Author(s):

Doruk Tiktiklar ◽

Gursel Baltaoglu ◽

Efsa Cakir ◽

Zeynep Kucuk ◽

Mehmet S. Aktas

Keyword(s):

Comparative Analysis ◽

Sequence Mining ◽

Mining Algorithms

Probabilistic algorithm for mining frequent sequences

Lietuvos matematikos rinkinys ◽

10.15388/lmr.2010.57 ◽

2010 ◽

Vol 51 ◽

Author(s):

Julija Pragarauskaitė ◽

Gintautas Dzemyda

Keyword(s):

Random Sample ◽

Statistical Methods ◽

Large Volume ◽

Sequence Mining ◽

Sample Analysis ◽

Probabilistic Algorithm ◽

Frequent Sequence ◽

Mining Algorithms

Frequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, financial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisionsabout the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods.

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Detection of acute lymphoblastic leukemia using image segmentation and data mining algorithms

Medical & Biological Engineering & Computing ◽

10.1007/s11517-019-01984-1 ◽

2019 ◽

Vol 57 (8) ◽

pp. 1783-1811 ◽

Cited By ~ 4

Author(s):

Vasundhara Acharya ◽

Preetham Kumar

Keyword(s):

Data Mining ◽

Image Segmentation ◽

Acute Lymphoblastic Leukemia ◽

Lymphoblastic Leukemia ◽

Data Mining Algorithms ◽

Mining Algorithms