alignment algorithm
Recently Published Documents


TOTAL DOCUMENTS

611
(FIVE YEARS 153)

H-INDEX

29
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Adam Zemla ◽  
Jonathan E. Allen ◽  
Dan Kirshner ◽  
Felice C. Lightstone

We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions (spheres) adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. Currently, PDBspheres library contains more than 2 million spheres, organized to facilitate searches by sequence and/or structure similarity of protein-ligand binding sites or interfaces between interacting molecules. PDBspheres uses the LGA structure alignment algorithm as the main engine for detecting structure similarities between the protein of interest and library spheres. An all-atom structure similarity metric ensures that sidechain placement is taken into account in the PDBspheres primary assessment of confidence in structural matches. In this paper, we (1) describe the PDBspheres method, (2) demonstrate how PDBspheres can be used to detect and characterize binding sites in protein structures, (3) compare PDBspheres use for binding site prediction with seven other binding site prediction methods using a curated dataset of 2,528 ligand-bound and ligand-free crystal structures, and (4) use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of the 4,876 structures in the refined set of PDBbind 2019 dataset. The PDBspheres library is made publicly available for download at https://proteinmodel.org/AS2TS/PDBspheres


Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7201
Author(s):  
Christian Permann ◽  
Thomas Seidel ◽  
Thierry Langer

Chemical features of small molecules can be abstracted to 3D pharmacophore models, which are easy to generate, interpret, and adapt by medicinal chemists. Three-dimensional pharmacophores can be used to efficiently match and align molecules according to their chemical feature pattern, which facilitates the virtual screening of even large compound databases. Existing alignment methods, used in computational drug discovery and bio-activity prediction, are often not suitable for finding matches between pharmacophores accurately as they purely aim to minimize RMSD or maximize volume overlap, when the actual goal is to match as many features as possible within the positional tolerances of the pharmacophore features. As a consequence, the obtained alignment results are often suboptimal in terms of the number of geometrically matched feature pairs, which increases the false-negative rate, thus negatively affecting the outcome of virtual screening experiments. We addressed this issue by introducing a new alignment algorithm, Greedy 3-Point Search (G3PS), which aims at finding optimal alignments by using a matching-feature-pair maximizing search strategy while at the same time being faster than competing methods.


2021 ◽  
Vol 30 (1) ◽  
pp. 97-121
Author(s):  
Tien-Ping Tan ◽  
Chai Kim Lim ◽  
Wan Rose Eliza Abdul Rahman

A parallel text corpus is an important resource for building a machine translation (MT) system. Existing resources such as translated documents, bilingual dictionaries, and translated subtitles are excellent resources for constructing parallel text corpus. A sentence alignment algorithm automatically aligns source sentences and target sentences because manual sentence alignment is resource-intensive. Over the years, sentence alignment approaches have improved from sentence length heuristics to statistical lexical models to deep neural networks. Solving the alignment problem as a classification problem is interesting as classification is the core of machine learning. This paper proposes a parallel long-short-term memory with attention and convolutional neural network (parallel LSTM+Attention+CNN) for classifying two sentences as parallel or non-parallel sentences. A sliding window approach is also proposed with the classifier to align sentences in the source and target languages. The proposed approach was compared with three classifiers, namely the feedforward neural network, CNN, and bi-directional LSTM. It is also compared with the BleuAlign sentence alignment system. The classification accuracy of these models was evaluated using Malay-English parallel text corpus and UN French-English parallel text corpus. The Malay-English sentence alignment performance was then evaluated using research documents and the very challenging Classical Malay-English document. The proposed classifier obtained more than 80% accuracy in categorizing parallel/non-parallel sentences with a model built using only five thousand training parallel sentences. It has a higher sentence alignment accuracy than other baseline systems.


2021 ◽  
Author(s):  
Daniel Liu ◽  
Martin Steinegger

Background: The Smith-Waterman-Gotoh alignment algorithm is the most popular method for comparing biological sequences. Recently, Single Instruction Multiple Data methods have been used to speed up alignment. However, these algorithms have limitations like being optimized for specific scoring schemes, cannot handle large gaps, or require quadratic time computation. Results: We propose a new algorithm called block aligner for aligning nucleotide and protein sequences. It greedily shifts and grows a block of computed scores to span large gaps within the aligned sequences. This greedy approach is able to only compute a fraction of the DP matrix. In exchange for these features, there is no guarantee that the computed scores are accurate compared to full DP. However, in our experiments, we show that block aligner performs accurately on various realistic datasets, and it is up to 9 times faster than the popular Farrar's algorithm for protein global alignments. Conclusions: Our algorithm has applications in computing global alignments and X-drop alignments on proteins and long reads. It is available as a Rust library at https://github.com/Daniel-Liu-c0deb0t/block-aligner.


2021 ◽  
pp. 977-985
Author(s):  
Guo-Da Cheng ◽  
Guo-Liang Yang ◽  
Jia-He Xia ◽  
Yang Shang ◽  
Sihai Li
Keyword(s):  

2021 ◽  
Author(s):  
Samantha Petti ◽  
Nicholas Bhattacharya ◽  
Roshan Rao ◽  
Justas Dauparas ◽  
Neil Thomas ◽  
...  

Multiple Sequence Alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF mildly improves contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing the predicted confidence metric, we can learn MSAs that improve structure predictions over the initial MSAs. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment.


2021 ◽  
Author(s):  
Wence Shi ◽  
Jiangning Xu ◽  
Hongyang He ◽  
Ding Li ◽  
Hongqiong Tang ◽  
...  

2021 ◽  
Vol 43 (3) ◽  
pp. 1652-1668
Author(s):  
Xiangwen Wang ◽  
Yonggang Lu ◽  
Jiaxuan Liu

Three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is a significant technique for recovering the 3D structure of proteins or other biological macromolecules from their two-dimensional (2D) noisy projection images taken from unknown random directions. Class averaging in single-particle cryo-EM is an important procedure for producing high-quality initial 3D structures, where image alignment is a fundamental step. In this paper, an efficient image alignment algorithm using 2D interpolation in the frequency domain of images is proposed to improve the estimation accuracy of alignment parameters of rotation angles and translational shifts between the two projection images, which can obtain subpixel and subangle accuracy. The proposed algorithm firstly uses the Fourier transform of two projection images to calculate a discrete cross-correlation matrix and then performs the 2D interpolation around the maximum value in the cross-correlation matrix. The alignment parameters are directly determined according to the position of the maximum value in the cross-correlation matrix after interpolation. Furthermore, the proposed image alignment algorithm and a spectral clustering algorithm are used to compute class averages for single-particle 3D reconstruction. The proposed image alignment algorithm is firstly tested on a Lena image and two cryo-EM datasets. Results show that the proposed image alignment algorithm can estimate the alignment parameters accurately and efficiently. The proposed method is also used to reconstruct preliminary 3D structures from a simulated cryo-EM dataset and a real cryo-EM dataset and to compare them with RELION. Experimental results show that the proposed method can obtain more high-quality class averages than RELION and can obtain higher reconstruction resolution than RELION even without iteration.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1849
Author(s):  
Dan-Marian Joiţa ◽  
Mihaela Aurelia Tomescu ◽  
Donatella Bàlint ◽  
Lorentz Jäntschi

Protein alignment finds its application in refining results of sequence alignment and understanding protein function. A previous study aligned single molecules, making use of the minimization of sums of the squares of eigenvalues, obtained for the antisymmetric Cartesian coordinate distance matrices Dx and Dy. This is used in our program to search for similarities between amino acids by comparing the sums of the squares of eigenvalues associated with the Dx, Dy, and Dz distance matrices. These matrices are obtained by removing atoms that could lead to low similarity. Candidates are aligned, and trilateration is used to attach all previously striped atoms. A TM-score is the scoring function that chooses the best alignment from supplied candidates. Twenty essential amino acids that take many forms in nature are selected for comparison. The correct alignment is taken into account most of the time by the alignment algorithm. It was numerically detected by the TM-score 70% of the time, on average, and 15% more cases with close scores can be easily distinguished by human observation.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
G Rios-Munoz ◽  
C Perez-Hernandez ◽  
F Fernandez-Aviles ◽  
A Arenal

Abstract Introduction There exist many imaging techniques and systems to reproduce atrial chambers in 3D. These technologies include electroanatomical (EA) mapping systems, noninvasive electrocardiographic imaging (ECGI), magnetic resonance imaging (MRI), or computed tomography (CT) scans. In the case of atrial fibrillation (AF), the most employed non-pharmacological treatment is catheter ablation to electrically isolate the pulmonary veins from the rest of the left atrium. Driver mechanisms such as focal or rotational activity have been proposed as possible initiating and maintaining mechanisms of AF. However, correspondence and validation of these sites when several systems are employed in the same patient remains a challenge, as they are mostly manually aligned based on visual inspection. Purpose To develop an automatic 3D alignment algorithm for cardiac 3D meshes to colocalize points between atrial maps generated with multiple EA mapping systems, ECGI, MRI, or CT scans. Methods A total of 25 left atrial meshes from persistent AF patients were exported from an EA mapping system. The total number of vertices for all the meshes was 2545444 points (101817.8±13593.3 points per map). A reference mesh was employed with minor modifications [1]. All meshes were manually segmented into 12 different left atrial regions, see Table for the region names. The method implements a non-rigid variant of the iterative closest point algorithm to transform the atrial mesh onto the reference one, see Figure. The geographical distance between the mean position of the 12 different segmented reference areas and the 12 transformed points was employed as the performance metric. Results The global error for all the fiducial points in all left atrial meshes was 11.57±2.55 mm. The average local errors for the 12 atrial areas are summarized in the Table. The best three aligned areas were the RSPV, atrial septum, and lateral wall. The areas with less alignment accuracy were the LAA, LSPV, and atrial roof. Conclusions The algorithm provides a promising solution to evaluate and validate site-related results from different systems, e.g., rotational activity presence between EA mapping and ECGI systems. The method works automatically for any given chamber anatomy or any number of points. No prior segmentation is needed since the transformation and co-localization are applied to the raw chamber mesh. Further analysis with a larger mesh database is needed. FUNDunding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Instituto de Salud Carlos III and Ministerio de Ciencia, Innovaciόn y Universidades


Sign in / Sign up

Export Citation Format

Share Document