EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences

AbstractThe availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.

Download Full-text

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences

10.1101/566299 ◽

2019 ◽

Author(s):

Xinzhou Ge ◽

Haowen Zhang ◽

Lingjue Xie ◽

Wei Vivian Li ◽

Soo Bin Kwon ◽

...

Keyword(s):

Dynamic Programming Algorithm ◽

Real Data ◽

Chromatin State ◽

Programming Algorithm ◽

Local Alignment ◽

Alignment Algorithm ◽

Genome Wide ◽

Bioinformatic Tool ◽

Cell Type Specific ◽

Nih Roadmap

ABSTRACTThe availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the effcacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.

Download Full-text

PSRna: Prediction of small RNA secondary structures based on reverse complementary folding method

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016430010 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1643001 ◽

Cited By ~ 1

Author(s):

Jin Li ◽

Chengzhen Xu ◽

Lei Wang ◽

Hong Liang ◽

Weixing Feng ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Small Rna ◽

Small Rnas ◽

Dynamic Programming Algorithm ◽

Real Data ◽

Secondary Structures ◽

Minimum Free Energy ◽

Programming Algorithm ◽

Rna Secondary Structures

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).

Download Full-text

PIPI: PTM-Invariant Peptide Identification Using Coding Method

10.1101/055806 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fengchao Yu ◽

Ning Li ◽

Weichuan Yu

Keyword(s):

Amino Acids ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

Peptide Identification ◽

Real Data ◽

Search Space ◽

Database Search ◽

Programming Algorithm ◽

Post Translational Modification ◽

Coding Method

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.

Download Full-text

Multiple change-points detection in high dimension

Random Matrices Theory and Application ◽

10.1142/s201032631950014x ◽

2019 ◽

Vol 08 (04) ◽

pp. 1950014 ◽

Cited By ~ 1

Author(s):

Yunlong Wang ◽

Changliang Zou ◽

Zhaojun Wang ◽

Guosheng Yin

Keyword(s):

Dynamic Programming Algorithm ◽

Null Distribution ◽

Real Data ◽

Information Criterion ◽

Change Point Detection ◽

New Method ◽

Change Points ◽

Estimation Accuracy ◽

Programming Algorithm ◽

Order Structure

Change-point detection is an integral component of statistical modeling and estimation. For high-dimensional data, classical methods based on the Mahalanobis distance are typically inapplicable. We propose a novel testing statistic by combining a modified Euclidean distance and an extreme statistic, and its null distribution is asymptotically normal. The new method naturally strikes a balance between the detection abilities for both dense and sparse changes, which gives itself an edge to potentially outperform existing methods. Furthermore, the number of change-points is determined by a new Schwarz’s information criterion together with a pre-screening procedure, and the locations of the change-points can be estimated via the dynamic programming algorithm in conjunction with the intrinsic order structure of the objective function. Under some mild conditions, we show that the new method provides consistent estimation with an almost optimal rate. Simulation studies show that the proposed method has satisfactory performance of identifying multiple change-points in terms of power and estimation accuracy, and two real data examples are used for illustration.

Download Full-text

REDUCING THE SEARCH SPACE AND TIME COMPLEXITY OF NEEDLEMAN-WUNSCH ALGORITHM (GLOBAL ALIGNMENT) AND SMITH-WATERMAN ALGORITHM (LOCAL ALIGNMENT) FOR DNA SEQUENCE ALIGNMENT

Jurnal Teknologi ◽

10.11113/jt.v77.6564 ◽

2015 ◽

Vol 77 (20) ◽

Cited By ~ 1

Author(s):

F. N. Muhamad ◽

R. B. Ahmad ◽

S. Mohd. Asi ◽

M. N. Murad

Keyword(s):

Dynamic Programming ◽

Dna Sequences ◽

Sequence Comparison ◽

Dynamic Programming Algorithm ◽

Search Space ◽

Programming Algorithm ◽

Local Alignment ◽

Global Alignment ◽

Main Research ◽

Dna Sequence Alignment

The fundamental procedure of analyzing sequence content is sequence comparison. Sequence comparison can be defined as the problem of finding which parts of the sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them. A typical approach to solve this problem is to find a good and reasonable alignment between the two sequences. The main research in this project is to align the DNA sequences by using the Needleman-Wunsch algorithm for global alignment and Smith-Waterman algorithm for local alignment based on the Dynamic Programming algorithm. The Dynamic Programming Algorithm is guaranteed to find optimal alignment by exploring all possible alignments and choosing the best through the scoring and traceback techniques. The algorithms proposed and evaluated are to reduce the gaps in aligning sequences as well as the length of the sequences aligned without compromising the quality or correctness of results. In order to verify the accuracy and consistency of measurements obtained in Needleman-Wunsch and Smith-Waterman algorithms the data is compared with Emboss (global) and Emboss (local) with 600 strands test data.

Download Full-text

Time Efficient Segmented Technique for Dynamic Programming Based Algorithms with FPGA Implementation

Journal of Circuits System and Computers ◽

10.1142/s021812661950227x ◽

2019 ◽

Vol 28 (13) ◽

pp. 1950227

Author(s):

Talal Bonny ◽

Ridhwan Al Debsi ◽

Mohamed Basel Almourad

Keyword(s):

Dynamic Programming ◽

Sequence Alignment ◽

Input Parameter ◽

Dynamic Programming Algorithm ◽

Computation Time ◽

Longest Common Subsequence ◽

Programming Algorithm ◽

Optimization Approach ◽

Alignment Algorithm ◽

Optimal Sequence

Although dynamic programming (DP) is an optimization approach used to solve a complex problem fast, the time required to solve it is still not efficient and grows polynomially with the size of the input. In this contribution, we improve the computation time of the dynamic programming based algorithms by proposing a novel technique, which is called “SDP: Segmented Dynamic programming”. SDP finds the best way of splitting the compared sequences into segments and then applies the dynamic programming algorithm to each segment individually. This will reduce the computation time dramatically. SDP may be applied to any dynamic programming based algorithm to improve its computation time. As case studies, we apply the SDP technique on two different dynamic programming based algorithms; “Needleman–Wunsch (NW)”, the widely used program for optimal sequence alignment, and the LCS algorithm, which finds the “Longest Common Subsequence” between two input strings. The results show that applying the SDP technique in conjunction with the DP based algorithms improves the computation time by up to 80% in comparison to the sole DP algorithms, but with small or ignorable degradation in comparing results. This degradation is controllable and it is based on the number of split segments as an input parameter. However, we compare our results with the well-known heuristic FASTA sequence alignment algorithm, “GGSEARCH”. We show that our results are much closer to the optimal results than the “GGSEARCH” algorithm. The results are valid independent from the sequences length and their level of similarity. To show the functionality of our technique on the hardware and to verify the results, we implement it on the Xilinx Zynq-7000 FPGA.

Download Full-text