pairwise alignment
Recently Published Documents


TOTAL DOCUMENTS

152
(FIVE YEARS 47)

H-INDEX

16
(FIVE YEARS 3)

Cells ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 230
Author(s):  
Kevin-Phil Wüsthoff ◽  
Gerhard Steger

In 1985, Keese and Symons proposed a hypothesis on the sequence and secondary structure of viroids from the family : their secondary structure can be subdivided into five structural and functional domains and “viroids have evolved by rearrangement of domains between different viroids infecting the same cell and subsequent mutations within each domain”; this article is one of the most cited in the field of viroids. Employing the pairwise alignment method used by Keese and Symons and in addition to more recent methods, we tried to reproduce the original results and extent them to further members of which were unknown in 1985. Indeed, individual members of consist of a patchwork of sequence fragments from the family but the lengths of fragments do not point to consistent points of rearrangement, which is in conflict with the original hypothesis of fixed domain borders.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1809
Author(s):  
Xuhua Xia

Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.


2021 ◽  
Author(s):  
Samantha Petti ◽  
Nicholas Bhattacharya ◽  
Roshan Rao ◽  
Justas Dauparas ◽  
Neil Thomas ◽  
...  

Multiple Sequence Alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF mildly improves contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing the predicted confidence metric, we can learn MSAs that improve structure predictions over the initial MSAs. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment.


Plant Disease ◽  
2021 ◽  
Author(s):  
Nikita Zrelovs ◽  
Gunta Resevica ◽  
Ieva Kalnciema ◽  
Helvijs Niedra ◽  
Gunārs Lācis ◽  
...  

Blackcurrants (Ribes nigrum) are among the most important commercial berry crops in Latvia and, together with redcurrants and gooseberries, have a long history of local cultivation and breeding (Zuļģe et al. 2018). So far at least 20 viruses were reported to infect Ribes plants (Špak et al. 2021). Blackcurrant-associated rhabdovirus (BCaRV) was previously identified in USA by high throughput sequencing (HTS) of blackcurrant germplasm accession introduced from Russia (isolate Veloy) and now serves as an exemplar isolate for BCaRV (Wu et al. 2018). Presence of BCaRV was also confirmed by RT-PCR in blackcurrant germplasm accession of cv. Burga from France during the same study by Wu et al. (2018). Currently Blackcurrant betanucleorhabdovirus is one of the nine species recognized by ICTV in genus Betanucleorhabdovirus of family Rhabdoviridae, but the impact of BCaRV on the host still remains unknown. Leaf tissue from twelve asymptomatic blackcurrant cv. Mara Eglite plants that negatively tested for blackcurrant reversion virus from Dobele, Latvia (56°36'31.9"N, 23°18'13.6"E) was collected on May 17, 2019 and used for HTS study of local Ribes resistance genes. Total RNA from the leaf tissue of sampled plants was isolated following a method described by Kalinowska et al. (2012) with minor modifications. Briefly, RNeasy Plant Mini Kit (QIAGEN) was used with RLC lysis buffer being supplemented with 2% PVP and 1% β-mercaptoethanol. Plant rRNA was subsequently removed by a RiboMinus Plant Kit for RNA-Seq (Thermo Fisher Scientific (TFS)) prior to cDNA library construction. HTS libraries were prepared using MGIEasy RNA Directional Library Prep Set for 16 reactions (MGI), following a protocol for 150 bp pair-end reads. According to the manufacturers guidelines libraries were pooled, circularized and cleaned before being subjected to sequencing on DNBSEQ-G400 (MGI) using PE150 flow cell (MGI). The sequencing run yielded a total of 393660492 150 bp long read pairs. Reads were assembled into transcripts using rnaSPAdes v 3.13.1 (Bushmanova et al. 2019) and a 14424 base long contig with an average coverage of 684x was found to be 99.5% identical (14358/14432 identities and 8 gaps in the pairwise alignment) to the previously reported first complete genome of BCaRV (MF543022.1) using EMBOSS Needle (Madeira et al., 2019). This contig representing the genome of BCaRV isolate Mara Eglite, onto which 66768 of the raw reads could be mapped, was subsequently deposited at European Nucleotide Archive under accession number OU015520. All of the twelve individual samples were also tested for the presence of BCaRV by RT-PCR, using Verso cDNA Synthesis Kit with random hexamer primers (TFS) for first strand cDNA synthesis followed by PCR with N protein nested primers BCaRV-N-F (5’ AGATGTGCTTCATCGATGGCTAGTTCTGCT 3’) and BCaRV-N-R (5’ TGCATTCCCACGGGTTAGGAATACATTGGTACT 3’) resulting in a 243 bp long fragment for six of the samples. RT-PCR products from six BCaRV positive samples were directly sequenced by Sanger-based method using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) with BCaRV-N-F and BCaRV-N-R primers. Acquired RT-PCR product sequences matched the corresponding region of BCaRV isolate Mara Eglite genome assembled from HTS data. In this report, we have documented the natural occurrence of BCaRV in Latvia, which makes it a second evidence on the presence of BCaRV in Europe.


2021 ◽  
Vol 58 (3) ◽  
pp. 248-262
Author(s):  
S. K. Brar ◽  
N. Singla ◽  
L. D. Singla

Summary This first comprehensive report from Punjab province of India relates to patho-physiological alterations alongwith morpho-molecular characterisation and risk assessment of natural infections of Hymenolepis diminuta and Hymenolepis nana in 291commensal rodents including house rat, Rattus rattus (n=201) and lesser bandicoot rat, Bandicota bangalensis (n=90). Small intestine of 53.61 and 64.95 % rats was found infected with H. diminuta and H. nana, respectively with a concurrent infection rate of 50.86 %. There was no association between male and female rats and H. diminuta and H. nana infections (ᵡ2 = 0.016 and 0.08, respectively, d.f.= 1, P>0.05), while the host age had significant effect on prevalence of H. diminuta and H. nana (ᵡ2 = 28.12 and 7.18, respectively, d.f.= 1, P≤0.05) infection. Examination of faecal samples and intestinal contents revealed globular shaped eggs of H. diminuta without polar filaments (76.50 ± 3.01μm x 67.62 ± 2.42 μm), while smaller sized oval eggs of H. nana were with 4 – 8 polar filaments (47.87 ± 1.95 μm x 36.12 ± 3.05 μm). Cestode infection caused enteritis, sloughing of intestinal mucosa, necrosis of villi and inflammatory reaction with infiltration of mononuclear cells in the mucosa and submucosa. Morphometric identification of the adult cestodes recovered from the intestinal lumen was confirmed by molecular characterisation based on nuclear ITS-2 loci which showed a single band of 269 bp and 242 bp for H. diminuta and H. nana, respectively. Pairwise alignment of the ITS-2 regions showed 99.46 % similarity with sequences of H. diminuta from USA and 100 % similarity with sequences of H. nana from Slovakia, Kosice.


2021 ◽  
Author(s):  
Abbas Haghi ◽  
Santiago Marco-Sola ◽  
Lluc Alvarez ◽  
Dionysios Diamantopoulos ◽  
Christoph Hagleitner ◽  
...  
Keyword(s):  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hugo Talibart ◽  
François Coste

Abstract Background To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between $$3\%$$ 3 % and $$20\%$$ 20 % ) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time ($$1'37''$$ 1 ′ 37 ′ ′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean $$F_1$$ F 1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanyan Chen ◽  
Yan Wang ◽  
Juan Chen ◽  
Wu Zuo ◽  
Yong Fan ◽  
...  

AbstractChromosomes pair and synapse with their homologous partners to segregate correctly at the first meiotic division. Association of telomeres with the LINC (Linker of Nucleoskeleton and Cytoskeleton) complex composed of SUN1 and KASH5 enables telomere-led chromosome movements and telomere bouquet formation, facilitating precise pairwise alignment of homologs. Here, we identify a direct interaction between SUN1 and Speedy A (SPDYA) and determine the crystal structure of human SUN1-SPDYA-CDK2 ternary complex. Analysis of meiosis prophase I process in SPDYA-binding-deficient SUN1 mutant mice reveals that the SUN1-SPDYA interaction is required for the telomere-LINC complex connection and the assembly of a ring-shaped telomere supramolecular architecture at the nuclear envelope, which is critical for efficient homologous pairing and synapsis. Overall, our results provide structural insights into meiotic telomere structure that is essential for meiotic prophase I progression.


2021 ◽  
Author(s):  
Ron Zeira ◽  
Max Land ◽  
Benjamin J. Raphael

AbstractSpatial transcriptomics (ST) is a new technology that measures mRNA expression across thousands of spots on a tissue slice, while preserving information about the spatial location of spots. ST is typically applied to several replicates from adjacent slices of a tissue. However, existing methods to analyze ST data do not take full advantage of the similarity in both gene expression and spatial organization across these replicates. We introduce a new method PASTE (Probabilistic Alignment of ST Experiments) to align and integrate ST data across adjacent tissue slices leveraging both transcriptional similarity and spatial distances between spots. First, we formalize and solve the problem of pairwise alignment of ST data from adjacent tissue slices, or layers, using Fused Gromov-Wasserstein Optimal Transport (FGW-OT), which accounts for variability in the composition and spatial location of the spots on each layer. From these pairwise alignments, we construct a 3D representation of the tissue. Next, we introduce the problem of simultaneous alignment and integration of multiple ST layers into a single layer with a low rank gene expression matrix. We derive an algorithm to solve the problem by alternating between solving FGW-OT instances and solving a Non-negative Matrix Factorization (NMF) of a weighted expression matrix. We show on both simulated and real ST datasets that PASTE accurately aligns spots across adjacent layers and accurately estimates a consensus expression matrix from multiple ST layers. PASTE outperforms integration methods that rely solely on either transcriptional similarity or spatial similarity, demonstrating the advantages of combining both types of information.Code availabilitySoftware is available at https://github.com/raphael-group/paste


2021 ◽  
Vol 24 (1) ◽  
pp. 7-14
Author(s):  
Nenik Kholilah ◽  
Norma Afiati ◽  
Subagiyo Subagiyo ◽  
Retno Hartati

O. laqueus was first discovered not long ago in 2005 in the Ryuku Islands, Japan. Its geographical distribution and molecular identification are therefore still rarely. Nucleotide sequences based on mt-DNA COI for O. laqueus that have been uploaded in the GenBank until before this study was carried out were only six sequences. Since DNA barcoding of mt-DNA COI has some advantageous characteristics, this study aimed to analyse the genetic difference of Indonesian O. laqueus to the data available in the GenBank. Samples were collected in 2019 - 2020 from Karimunjawa (n=16) and Bangka-Belitung (n=2). The mt-DNA COI was extracted using 10% chelex methods, PCR amplified using Folmer’s primer and sequenced in Sanger methods. Pairwise alignment and genetic distance were carried out in MEGA-X, whereas the phylogenetic tree was reconstructed using Bayesian methods. BLAST identification resulted in 685 bp with a range of 92,07-99,24  percentages of identity. The genetic mean pair-wise distances within-clade were 0,002 and 0,006, whilst the distance between the clade was 0.0883. Combining the suggestion with the ITF current, it is concluded that O. laqueus taken from Karimunjawa raised from the same species as those in Malaysia (MN711655) and Japan (AB302176). Specimens from Bangka-Belitung were suggested came from different species, as they were separated into the second clade by 8.83%. One single sample from Japan (AB430543) which laid outside the two clades by 11.63%-11.38% was also suggested to represent a different species. Overall, this study opens to various further studies on O. laqueus using other loci of genetic markers.


Sign in / Sign up

Export Citation Format

Share Document