alignment quality
Recently Published Documents


TOTAL DOCUMENTS

72
(FIVE YEARS 15)

H-INDEX

14
(FIVE YEARS 1)

2021 ◽  
Vol 2056 (1) ◽  
pp. 012044
Author(s):  
T P Tkachenko ◽  
A A Zhukov ◽  
E P Pozhidaev

Abstract The paper considers the possibility of controlling the alignment quality of helical nanostructures of ferroelectric liquid crystals (FLCs) within the concept of biaxial surface potential due to variation the FLCs helical pitch p0 and polymer aligning layers structures.


2021 ◽  
Author(s):  
Mateo Gray ◽  
Sean Chester ◽  
Hosna Jabbari

Abstract BackgroundImproving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. ResultsWe present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment and uses covariation and thermodynamic energy minimization to predict secondary structures for each individual sequence in the alignment. We compared KnotAli’s performance to that of three other alignment-based programs, on a large data set of 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). We found KnotAli to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. ConclusionsWe find KnotAli’s predictions to be less dependent on alignment quality. In particular, KnotAli is shown to have more accurate predictions compared to other leading methods as alignment quality deteriorates. KnotAli can be found online on github at https://github.com/mateog4712/KnotAli


2021 ◽  
Author(s):  
Fulong Yu ◽  
Vijay G. Sankaran ◽  
Guo-Cheng Yuan

AbstractGenome-wide profiling of transcription factor binding and chromatin states is a widely-used approach for mechanistic understanding of gene regulation. Recent technology development has enabled such profiling at single-cell resolution. However, an end-to-end computational pipeline for analyzing such data is still lacking. To fill this gap, we have developed a flexible pipeline for analysis and visualization of single-cell CUT&RUN and CUT&Tag data, which provides functions for sequence alignment, quality control, dimensionality reduction, cell clustering, data aggregation, and visualization. Furthermore, it is also seamlessly integrated with the functions in original CUT&RUNTools for population-level analyses. As such, this provides a valuable toolbox for the community.


Author(s):  
Matteo Vidali ◽  
Anna Carobene ◽  
Sara Apassiti Esposito ◽  
Gavino Napolitano ◽  
Alessandra Caracciolo ◽  
...  

2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Valery Polyanovsky ◽  
Alexander Lifanov ◽  
Natalia Esipova ◽  
Vladimir Tumanyan

Abstract Background The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins. Results We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters. Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true. Conclusions This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.


2020 ◽  
Author(s):  
Wenfa Ng

AbstractUnderstanding how one sequence relates to another at the nucleotide or amino acid level allows the derivation of new knowledge regarding the provenance of particular sequence as well as the determination of consensus sequence motifs that informs biological conservation at the sequence level. To this end, local or multiple sequence alignments tools in bioinformatics have been developed to automatically profile two or more nucleotide or amino acid sequence in search of matches in stretches of nucleotides or amino acid sequence that yield an alignment. While alignment score is a common metric for assessing alignment quality, relative difference between alignment scores does not readily correlate with concrete measures such as number of mismatches and length of longest match in alignment. Thus, using swalign local sequence alignment function in MATLAB on 200 alignments between RNA-seq sequence read and reference Escherichia coli K-12 MG1655 genome sequence in the sense and antisense direction, this work sought to shed some light on how alignment score from swalign correlates with number of mismatches and length of longest match. Results revealed that number of mismatches negatively correlate with alignment score; thereby, validating theoretical predictions that larger number of mismatches would result in a poorer alignment and lower alignment score. However, dependence of alignment score on other factors such as length of longest match and gap penalty from opening an alignment gap prevents linear relationship to be obtained between number of mismatches and alignment score. On the other hand, length of longest match was found to positively correlate with alignment score as predicted from theoretical understanding. But, data obtained revealed that clusters of data points gather at two regions of the scatter plot involving short matches and low alignment score, as well as long matches and high alignment score. Such clustering and sparseness of data points between the two clusters preclude the elucidation of a linear quantitative relationship between length of longest match and alignment score. Overall, dependence of alignment score of swalign on number of mismatches and length of longest match in alignment match theoretical predictions; thereby, validating the utility of alignment score in indicating the qualitative quality of alignment. However, given that alignment score inherently depends on a multitude of factors, users could not easily discern the quantitative difference in mismatches and length of longest match from relative differences between two alignment scores. Such problems are unlikely to be resolved given the near impossibility of obtaining quantitative linear relationship correlating either number of mismatches or length of longest match with alignment score of a sequence alignment tool.HighlightsNumber of mismatches in alignment negatively correlates with alignment score.Length of longest match positively correlates with alignment score.Quantitative linear relationship could not be obtained for alignment score with either number of mismatches or length of longest match.Results validate that swalign tool in MATLAB could quantitatively detect differences in alignment quality and expressed it using alignment score.But, relative alignment score of two alignments remains a nebulous concept with regards to differences in number of mismatches and length of longest match.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2700 ◽  
Author(s):  
Yihang Jiang ◽  
Yuankai Qi ◽  
Will Ke Wang ◽  
Brinnae Bent ◽  
Robert Avram ◽  
...  

The dynamic time warping (DTW) algorithm is widely used in pattern matching and sequence alignment tasks, including speech recognition and time series clustering. However, DTW algorithms perform poorly when aligning sequences of uneven sampling frequencies. This makes it difficult to apply DTW to practical problems, such as aligning signals that are recorded simultaneously by sensors with different, uneven, and dynamic sampling frequencies. As multi-modal sensing technologies become increasingly popular, it is necessary to develop methods for high quality alignment of such signals. Here we propose a DTW algorithm called EventDTW which uses information propagated from defined events as basis for path matching and hence sequence alignment. We have developed two metrics, the error rate (ER) and the singularity score (SS), to define and evaluate alignment quality and to enable comparison of performance across DTW algorithms. We demonstrate the utility of these metrics on 84 publicly-available signals in addition to our own multi-modal biomedical signals. EventDTW outperformed existing DTW algorithms for optimal alignment of signals with different sampling frequencies in 37% of artificial signal alignment tasks and 76% of real-world signal alignment tasks.


2020 ◽  
Vol 18 (02) ◽  
pp. 2050005
Author(s):  
Sanjay Bankapur ◽  
Nagamma Patil

Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model’s efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.


Sign in / Sign up

Export Citation Format

Share Document