Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree

In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining.

Download Full-text

Customized frequent patterns mining algorithms for enhanced Top-Rank-K frequent pattern mining

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114530 ◽

2021 ◽

Vol 169 ◽

pp. 114530

Author(s):

Areej Ahmad Abdelaal ◽

Sa'ed Abed ◽

Mohammad Al-Shayeji ◽

Mohammad Allaho

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Mining Algorithms

Download Full-text

Association rule based frequent pattern mining in biological sequences

2013 IEEE International Conference on Computational Intelligence and Computing Research ◽

10.1109/iccic.2013.6724203 ◽

2013 ◽

Cited By ~ 1

Author(s):

A Salim ◽

S. S. Vinod Chandra

Keyword(s):

Association Rule ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Biological Sequences ◽

Rule Based

Download Full-text

Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases

Electronics ◽

10.3390/electronics10121478 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1478

Author(s):

Penugonda Ravikumar ◽

Palla Likhitha ◽

Bathala Venus Vikranth Raj ◽

Rage Uday Kiran ◽

Yutaka Watanobe ◽

...

Keyword(s):

Real World ◽

Pattern Mining ◽

Transportation Network ◽

Frequent Pattern Mining ◽

Temporal Databases ◽

Frequent Pattern ◽

Frequent Patterns ◽

Temporal Database ◽

First Case

Discovering periodic-frequent patterns in temporal databases is a challenging problem of great importance in many real-world applications. Though several algorithms were described in the literature to tackle the problem of periodic-frequent pattern mining, most of these algorithms use the traditional horizontal (or row) database layout, that is, either they need to scan the database several times or do not allow asynchronous computation of periodic-frequent patterns. As a result, this kind of database layout makes the algorithms for discovering periodic-frequent patterns both time and memory inefficient. One cannot ignore the importance of mining the data stored in a vertical (or columnar) database layout. It is because real-world big data is widely stored in columnar database layout. With this motivation, this paper proposes an efficient algorithm, Periodic Frequent-Equivalence CLass Transformation (PF-ECLAT), to find periodic-frequent patterns in a columnar temporal database. Experimental results on sparse and dense real-world and synthetic databases demonstrate that PF-ECLAT is memory and runtime efficient and highly scalable. Finally, we demonstrate the usefulness of PF-ECLAT with two case studies. In the first case study, we have employed our algorithm to identify the geographical areas in which people were periodically exposed to harmful levels of air pollution in Japan. In the second case study, we have utilized our algorithm to discover the set of road segments in which congestion was regularly observed in a transportation network.

Download Full-text

Maintenance of Frequent Patterns

Post-Mining of Association Rules ◽

10.4018/978-1-60566-404-0.ch014 ◽

2009 ◽

pp. 273-293 ◽

Cited By ~ 1

Author(s):

Mengling Feng ◽

Jinyan Li ◽

Guozhu Dong ◽

Limsoon Wong

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Research Opportunities ◽

Prefix Tree ◽

Emerging Trends ◽

Maintenance Problem

This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the field of frequent pattern mining. The frequent pattern maintenance problem is summarized with a study on how the space of frequent patterns evolves in response to data updates. This chapter focuses on incremental and decremental maintenance. Four major types of maintenance algorithms are studied: Apriori-based, partition-based, prefix-tree-based, and conciserepresentation- based algorithms. The authors study the advantages and limitations of these algorithms from both the theoretical and experimental perspectives. Possible solutions to certain limitations are also proposed. In addition, some potential research opportunities and emerging trends in frequent pattern maintenance are also discussed.

Download Full-text

An Efficient Frequent Patterns Mining Algorithm over Data Streams Based on FPD-Graph

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.4457 ◽

2012 ◽

Vol 433-440 ◽

pp. 4457-4462 ◽

Cited By ~ 1

Author(s):

Jun Shan Tan ◽

Zhu Fang Kuang ◽

Guo Gui Yang

Keyword(s):

Data Streams ◽

Data Stream ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Data Generation ◽

Experiment Data ◽

Mining Algorithm ◽

Head Node

The design of synopses structure is an important issue of frequent patterns mining over data stream. A data stream synopses structure FPD-Graph which is based on directed graph is proposed in this paper. The FPD-Graph contains list head node FPDG-Head and list node FPDG-Node. The operations of FPD-Graph consist of insert operation and deletion operation. A frequent pattern mining algorithm DGFPM based on sliding window over data stream is proposed in this paper. The IBM synthesizes data generation which output customers shopping a data are adopted as experiment data. The DGFPM algorithm not only has high precision for mining frequent patterns, but also has low processing time.

Download Full-text

BIG DATA MINING FOR INTERESTING PATTERNS WITH MAP REDUCE TECHNIQUE

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19634 ◽

2017 ◽

Vol 10 (13) ◽

pp. 191

Author(s):

Nikhil Jamdar ◽

A Vijayalakshmi

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Uncertain Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Map Reduce ◽

Frequent Patterns ◽

Precise Data ◽

Big Data Mining ◽

Transactional Databases

There are many algorithms available in data mining to search interesting patterns from transactional databases of precise data. Frequent pattern mining is a technique to find the frequently occurred items in data mining. Most of the techniques used to find all the interesting patterns from a collection of precise data, where items occurred in each transaction are certainly known to the system. As well as in many real-time applications, users are interested in a tiny portion of large frequent patterns. So the proposed user constrained mining approach, will help to find frequent patterns in which user is interested. This approach will efficiently find user interested frequent patterns by applying user constraints on the collections of uncertain data. The user can specify their own interest in the form of constraints and uses the Map Reduce model to find uncertain frequent pattern that satisfy the user-specified constraints

Download Full-text

Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for Data Streams: A Survey

Procedia Computer Science ◽

10.1016/j.procs.2014.08.019 ◽

2014 ◽

Vol 37 ◽

pp. 109-116 ◽

Cited By ~ 21

Author(s):

Shamila Nasreen ◽

Muhammad Awais Azam ◽

Khurram Shehzad ◽

Usman Naeem ◽

Mustansar Ali Ghazanfar

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Mining Algorithms

Download Full-text

Generative probabilistic biological sequence models that account for mutational variability

10.1101/2020.07.31.231381 ◽

2020 ◽

Author(s):

Eli N. Weinstein ◽

Debora S. Marks

Keyword(s):

Large Scale ◽

Sequence Data ◽

Disordered Proteins ◽

Biological Sequences ◽

Biological Sequence ◽

Multiple Sequence ◽

Continuous Space ◽

Future Evolution ◽

Disordered Protein ◽

Latent Representations

AbstractLarge-scale sequencing has revealed extraordinary diversity among biological sequences, produced over the course of evolution and within the lifetime of individual organisms. Existing methods for building statistical models of sequences often pre-process the data using multiple sequence alignment, an unreliable approach for many genetic elements (antibodies, disordered proteins, etc.) that is subject to fundamental statistical pathologies. Here we introduce a structured emission distribution (the MuE distribution) that accounts for mutational variability (substitutions and indels) and use it to construct generative and predictive hierarchical Bayesian models (H-MuE models). Our framework enables the application of arbitrary continuous-space vector models (e.g. linear regression, factor models, image neural-networks) to unaligned sequence data. Theoretically, we show that the MuE generalizes classic probabilistic alignment models. Empirically, we show that H-MuE models can infer latent representations and features for immune repertoires, predict functional unobserved members of disordered protein families, and forecast the future evolution of pathogens.

Download Full-text

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

10.7287/peerj.preprints.26471v1 ◽

2018 ◽

Author(s):

Weina Li ◽

Jiadong Ren

Keyword(s):

Pattern Mining ◽

Index Structure ◽

Sequence Mining ◽

Biological Sequence ◽

Sequence Database ◽

Sequence Pattern ◽

Space Efficiency ◽

Database Index ◽

Low Efficiency ◽

Sequence Position

A significant approach for the discovery of biological regulatory rules of genes, protein and their inheritance relationships is the extraction of meaningful patterns from biological sequence data.The existing algorithms of sequence pattern discovery, like MSPM and FBSB, suffice their low efficiency and accuracy. In order to deal with this issue, this paper presents a new algorithm for biological sequence pattern mining abbreviated MpBsmi based on the data Index Structure.The MpBsmi algorithm employs a sequence position table abbreviated ST and a sequence database index structure named DB-Index for data storing, mining and pattern expansion. The ST and DB-Index of single items are firstly obtained through scanning sequence database once. Then a new algorithm for fast support counting is developed to mine the table ST to identify the frequent single items. Based on a recursive connection strategy, the frequenct patterns are expanded and the expanded table ST is updated by scanning the DB-Index. The fast support counting algorithm is used for obtaining the frequent expansion patterns. Finally, a new pruning techniqueis developed for extended pattern to avoid the generation of unnecessarily large number of candidate patterns. The experiments results on multiple the classical protein sequence from the Pfam database validate the performance of the proposed algorithm including the accuracy, stability and scalability. It is showed that the proposed algorithm has achieved the better space efficiency, stability and scalability comparing with MSPM, FBSB which are the two main algorithms for biological sequence mining.

Download Full-text

Mapping Biomolecular Sequences: Graphical Representations - their Origins, Applications and Future Prospects

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207324666210510164743 ◽

2021 ◽

Vol 24 ◽

Author(s):

Ashesh Nandy

Keyword(s):

Dna Sequences ◽

Graphical Representation ◽

Sequence Data ◽

Basic Unit ◽

Graphical Representations ◽

Biological Sequences ◽

Biological Sequence ◽

New Approach ◽

3D Space ◽

2D And 3D

The exponential growth in the depositories of biological sequence data have generated an urgent need to store, retrieve and analyse the data efficiently and effectively for which the standard practice of using alignment procedures are not adequate due to high demand on computing resources and time. Graphical representation of sequences has become one of the most popular alignment-free strategies to analyse the biological sequences where each basic unit of the sequences – the bases adenine, cytosine, guanine and thymine for DNA/RNA, and the 20 amino acids for proteins – are plotted on a multi-dimensional grid. The resulting curve in 2D and 3D space and the implied graph in higher dimensions provide a perception of the underlying information of the sequences through visual inspection; numerical analyses, in geometrical or matrix terms, of the plots provide a measure of comparison between sequences and thus enable study of sequence hierarchies. The new approach has also enabled studies of comparisons of DNA sequences over many thousands of bases and provided new insights into the structure of the base compositions of DNA sequences In this article we review in brief the origins and applications of graphical representations and highlight the future perspectives in this field.

Download Full-text