optimal alignments
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 11)

H-INDEX

7
(FIVE YEARS 2)

Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7201
Author(s):  
Christian Permann ◽  
Thomas Seidel ◽  
Thierry Langer

Chemical features of small molecules can be abstracted to 3D pharmacophore models, which are easy to generate, interpret, and adapt by medicinal chemists. Three-dimensional pharmacophores can be used to efficiently match and align molecules according to their chemical feature pattern, which facilitates the virtual screening of even large compound databases. Existing alignment methods, used in computational drug discovery and bio-activity prediction, are often not suitable for finding matches between pharmacophores accurately as they purely aim to minimize RMSD or maximize volume overlap, when the actual goal is to match as many features as possible within the positional tolerances of the pharmacophore features. As a consequence, the obtained alignment results are often suboptimal in terms of the number of geometrically matched feature pairs, which increases the false-negative rate, thus negatively affecting the outcome of virtual screening experiments. We addressed this issue by introducing a new alignment algorithm, Greedy 3-Point Search (G3PS), which aims at finding optimal alignments by using a matching-feature-pair maximizing search strategy while at the same time being faster than competing methods.


2021 ◽  
Author(s):  
Xuan Song ◽  
Hai Yun Gao ◽  
Karl Herrup ◽  
Ronald P Hart

Gene expression studies using chimeric xenograft transplants or co-culture systems have proven to be valuable to uncover cellular dynamics and interactions during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating >97% accuracy across a range of species ratios. Alignment-independent methods, such as Convolutional Neural Networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. Our evaluation identifies valuable and effective strategies to dissect species composition of RNA sequencing data from mixed populations.


2021 ◽  
Author(s):  
Chirag Jain ◽  
Daniel Gibney ◽  
Sharma V. Thankachan

AbstractMotivationCo-linear chaining has proven to be a powerful technique for finding approximately optimal alignments and approximating edit distance. It is used as an intermediate step in numerous mapping tools that follow seed-and-extend strategy. Despite this popularity, subquadratic time algorithms for the case where chains support anchor overlaps and gap costs are not currently known. Moreover, a theoretical connection between co-linear chaining cost and edit distance remains unknown.ResultsWe present algorithms to solve the co-linear chaining problem with anchor overlaps and gap costs in Õ(n) time, where n denotes the count of anchors. We establish the first theoretical connection between co-linear chaining cost and edit distance. Specifically, we prove that for a fixed set of anchors under a carefully designed chaining cost function, the optimal ‘anchored’ edit distance equals the optimal co-linear chaining cost. Finally, we demonstrate experimentally that optimal co-linear chaining cost under the proposed cost function can be computed significantly faster than edit distance, and achieves high correlation with edit distance for closely as well as distantly related sequences.Implementationhttps://github.com/at-cg/[email protected], [email protected], [email protected]


2020 ◽  
Author(s):  
Nicola De Maio

Abstract Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The “cumulative indel model” approximates realistic evolutionary indel dynamics using differential equations. “Adaptive banding” reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ($\approx$530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.]


2020 ◽  
Vol 36 (12) ◽  
pp. 3712-3718
Author(s):  
Charlotte A Darby ◽  
Ravi Gaddipati ◽  
Michael C Schatz ◽  
Ben Langmead

Abstract Motivation Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Results Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these ‘gold standard’ Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly. Availability and implementation Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3892-3893
Author(s):  
Antonio Benítez-Hidalgo ◽  
Antonio J Nebro ◽  
José F Aldana-Montes

Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. Results The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Availability and implementation Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Dong Han ◽  
Yinhua Tian

In order to improve the efficiency of conformance checking in business process management, a business alignment approach is presented based on transition systems between relation matrices and Petri nets. Firstly, a log-based relation matrix of the events is obtained according to the event log. Then, the events in the relation matrix are observed and the transitions in the model are firing, and the activities in the log and in the model are compared. Next, the states of the log and the model are recorded until no new state can be generated, so a transition system can be obtained which includes optimal alignments between the event log and the process model. Finally, two detailed algorithms are presented to obtain an optimal alignment and all optimal alignments between the trace and the model based on the given cost function, respectively. The availability and effectiveness of the proposed approach are proved theoretically.


2019 ◽  
Author(s):  
Charlotte A. Darby ◽  
Ravi Gaddipati ◽  
Michael C. Schatz ◽  
Ben Langmead

AbstractRead alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these “gold standard” Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-MEM, and vg to align more reads correctly. Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license.


2019 ◽  
Vol 11 (18) ◽  
pp. 5058 ◽  
Author(s):  
Guido Marseglia ◽  
Carlo Maria Medaglia ◽  
Francisco A. Ortega ◽  
Juan A. Mesa

The achievement of some of the Sustainable Development Goals (SDGs) from the recent 2030 Agenda for Sustainable Development has drawn the attention of many countries towards urban transport networks. Mathematical modeling constitutes an analytical tool for the formal description of a transportation system whereby it facilitates the introduction of variables and the definition of objectives to be optimized. One of the stages of the methodology followed in the design of urban transit systems starts with the determination of corridors to optimize the population covered by the system whilst taking into account the mobility patterns of potential users and the time saved when the public network is used instead of private means of transport. Since the capture of users occurs at stations, it seems reasonable to consider an extensive and homogeneous set of candidate sites evaluated according to the parameters considered (such as pedestrian population captured and destination preferences) and to select subsets of stations so that alignments can take place. The application of optimization procedures that decide the sequence of nodes composing the alignment can produce zigzagging corridors, which are less appropriate for the design of a single line. The main aim of this work is to include a new criterion to avoid the zigzag effect when the alignment is about to be determined. For this purpose, a curvature concept for polygonal lines is introduced, and its performance is analyzed when criteria of maximizing coverage and minimizing curvature are combined in the same design algorithm. The results show the application of the mathematical model presented for a real case in the city of Seville in Spain.


Sign in / Sign up

Export Citation Format

Share Document