Boosting Graph Alignment Algorithms

2021 ◽  
Author(s):  
Alexander Frederiksen Kyster ◽  
Simon Daugaard Nielsen ◽  
Judith Hermanns ◽  
Davide Mottin ◽  
Panagiotis Karras
Author(s):  
Renganayaki G. ◽  
Achuthsankar S. Nair

Sequence alignment algorithms and  database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid.   In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the  residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs,  the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.


2016 ◽  
Vol 11 (3) ◽  
pp. 375-381
Author(s):  
Yu Zhang ◽  
Jian Tai He ◽  
Yangde Zhang ◽  
Ke Zuo

2019 ◽  
Vol 12 (2) ◽  
pp. 128-134
Author(s):  
Sanjeev Kumar ◽  
Suneeta Agarwal ◽  
Ranvijay

Background: DNA and Protein sequences of an organism contain a variety of repeated structures of various types. These repeated structures play an important role in Molecular biology as they are related to genetic backgrounds of inherited diseases. They also serve as a marker for DNA mapping and DNA fingerprinting. Efficient searching of maximal and super maximal repeats in DNA/Protein sequences can lead to many other applications in the area of genomics. Moreover, these repeats can also be used for identification of critical diseases by finding the similarity between frequency distributions of repeats in viruses and genomes (without using alignment algorithms). Objective: The study aims to develop an efficient tool for searching maximal and super maximal repeats in large DNA/Protein sequences. Methods: The proposed tool uses a newly introduced data structure Induced Enhanced Suffix Array (IESA). IESA is an extension of enhanced suffix array. It uses induced suffix array instead of classical suffix array. IESA consists of Induced Suffix Array (ISA) and an additional array-Longest Common Prefix (LCP) array. ISA is an array of all sorted suffixes of the input sequence while LCP array stores the lengths of the longest common prefixes between all pairs of consecutive suffixes in an induced suffix array. IESA is known to be efficient w.r.t. both time and space. It facilitates the use of secondary memory for constructing the large suffix-array. Results: An open source standalone tool named MSR-IESA for searching maximal and super maximal repeats in DNA/Protein sequences is provided at https://github.com/sanjeevalg/MSRIESA. Experimental results show that the proposed algorithm outperforms other state of the art works w.r.t. to both time and space. Conclusion: The proposed tool MSR-IESA is remarkably efficient for the analysis of DNA/Protein sequences, having maximal and super maximal repeats of any length. It can be used for identification of well-known diseases.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 388
Author(s):  
Nikolet Doneva ◽  
Irini Doytchinova ◽  
Ivan Dimitrov

The assessment of immunogenicity of biopharmaceuticals is a crucial step in the process of their development. Immunogenicity is related to the activation of adaptive immunity. The complexity of the immune system manifests through numerous different mechanisms, which allows the use of different approaches for predicting the immunogenicity of biopharmaceuticals. The direct experimental approaches are sometimes expensive and time consuming, or their results need to be confirmed. In this case, computational methods for immunogenicity prediction appear as an appropriate complement in the process of drug design. In this review, we analyze the use of various In silico methods and approaches for immunogenicity prediction of biomolecules: sequence alignment algorithms, predicting subcellular localization, searching for major histocompatibility complex (MHC) binding motifs, predicting T and B cell epitopes based on machine learning algorithms, molecular docking, and molecular dynamics simulations. Computational tools for antigenicity and allergenicity prediction also are considered.


2021 ◽  
Author(s):  
Kristoffer Sahlin

Short-read genome alignment is a fundamental computational step used in many bioinformatic analyses. It is therefore desirable to align such data as fast as possible. Most alignment algorithms consider a seed-and-extend approach. Several popular programs perform the seeding step based on the Burrows-Wheeler Transform with a low memory footprint, but they are relatively slow compared to more recent approaches that use a minimizer-based seeding-and-chaining strategy. Recently, syncmers and strobemers were proposed for sequence comparison. Both protocols were designed for improved conservation of matches between sequences under mutations. Syncmers is a thinning protocol proposed as an alternative to minimizers, while strobemers is a linking protocol for gapped sequences and was proposed as an alternative to k-mers. The main contribution in this work is a new seeding approach that combines syncmers and strobemers. We use a strobemer protocol (randstrobes) to link together syncmers (i.e., in syncmer-space) instead of over the original sequence. Our protocol allows us to create longer seeds while preserving mapping accuracy. A longer seed length reduces the number of candidate regions which allows faster mapping and alignment. We also contribute the insight that speed-wise, this protocol is particularly effective when syncmers are canonical. Canonical syncmers can be created for specific parameter combinations and reduce the computational burden of computing the non-canonical randstrobes in reverse complement. We implement our idea in a proof-of-concept short-read aligner strobealign that aligns short reads 3-4x faster than minimap2 and 15-23x faster than BWA and Bowtie2. Many implementation versions of, e.g., BWA, achieve high speed on specific hardware. Our contribution is algorithmic and requires no hardware architecture or system-specific instructions. Strobealign is available at https://github.com/ksahlin/StrobeAlign.


Author(s):  
Tristan J. Mahr ◽  
Visar Berisha ◽  
Kan Kawabata ◽  
Julie Liss ◽  
Katherine C. Hustad

Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058


Sign in / Sign up

Export Citation Format

Share Document