scholarly journals Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments

Author(s):  
Tavor Z. Baharav ◽  
Govinda M. Kamath ◽  
David N. Tse ◽  
Ilan Shomorony
Patterns ◽  
2020 ◽  
Vol 1 (6) ◽  
pp. 100081
Author(s):  
Tavor Z. Baharav ◽  
Govinda M. Kamath ◽  
David N. Tse ◽  
Ilan Shomorony

2019 ◽  
Author(s):  
Tavor Z. Baharav ◽  
Govinda M. Kamath ◽  
David N. Tse ◽  
Ilan Shomorony

AbstractA key step in many genomic analysis pipelines is the identification of regions of similarity between pairs of DNA sequencing reads. This task, known as pairwise sequence alignment, is a heavy computational burden, particularly in the context of third-generation long-read sequencing technologies, which produce noisy reads. This issue is commonly addressed via a two-step approach: first, we filter pairs of reads which are likely to have a large alignment, and then we perform computationally intensive alignment algorithms only on the selected pairs. The Jaccard similarity between the set of k-mers of each read can be shown to be a proxy for the alignment size, and is usually used as the filter. This strategy has the added benefit that the Jaccard similarities don’t need to be computed exactly, and can instead be efficiently estimated through the use of min-hashes. This is done by hashing all k-mers of a read and computing the minimum hash value (the min-hash) for each read. For a randomly chosen hash function, the probability that the min-hashes are the same for two distinct reads is precisely their k-mer Jaccard similarity. Hence, one can estimate the Jaccard similarity by computing the fraction of min-hash collisions out of the set of hash functions considered.However, when the k-mer distribution of the reads being considered is significantly non-uniform, Jaccard similarity is no longer a good proxy for the alignment size. In particular, genome-wide GC biases and the presence of common k-mers increase the probability of a min-hash collision, thus biasing the estimate of alignment size provided by the Jaccard similarity. In this work, we introduce a min-hash-based approach for estimating alignment sizes called Spectral Jaccard Similarity which naturally accounts for an uneven k-mer distribution in the reads being compared. The Spectral Jaccard Similarity is computed by considering a min-hash collision matrix (where rows correspond to pairs of reads and columns correspond to different hash functions), removing an offset, and performing a singular value decomposition. The leading left singular vector provides the Spectral Jaccard Similarity for each pair of reads. In addition, we develop an approximation to the Spectral Jaccard Similarity that can be computed with a single matrix-vector product, instead of a full singular value decomposition.We demonstrate improvements in AUC of the Spectral Jaccard Similarity based filters over Jaccard Similarity based filters on 40 datasets of PacBio reads from the NCTC collection. The code is available at https://github.com/TavorB/spectral_jaccard_similarity.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Etai Jacob ◽  
Ron Unger ◽  
Amnon Horovitz

Methods for analysing correlated mutations in proteins are becoming an increasingly powerful tool for predicting contacts within and between proteins. Nevertheless, limitations remain due to the requirement for large multiple sequence alignments (MSA) and the fact that, in general, only the relatively small number of top-ranking predictions are reliable. To date, methods for analysing correlated mutations have relied exclusively on amino acid MSAs as inputs. Here, we describe a new approach for analysing correlated mutations that is based on combined analysis of amino acid and codon MSAs. We show that a direct contact is more likely to be present when the correlation between the positions is strong at the amino acid level but weak at the codon level. The performance of different methods for analysing correlated mutations in predicting contacts is shown to be enhanced significantly when amino acid and codon data are combined.


1999 ◽  
Vol 173 ◽  
pp. 185-188
Author(s):  
Gy. Szabó ◽  
K. Sárneczky ◽  
L.L. Kiss

AbstractA widely used tool in studying quasi-monoperiodic processes is the O–C diagram. This paper deals with the application of this diagram in minor planet studies. The main difference between our approach and the classical O–C diagram is that we transform the epoch (=time) dependence into the geocentric longitude domain. We outline a rotation modelling using this modified O–C and illustrate the abilities with detailed error analysis. The primary assumption, that the monotonity and the shape of this diagram is (almost) independent of the geometry of the asteroids is discussed and tested. The monotonity enables an unambiguous distinction between the prograde and retrograde rotation, thus the four-fold (or in some cases the two-fold) ambiguities can be avoided. This turned out to be the main advantage of the O–C examination. As an extension to the theoretical work, we present some preliminary results on 1727 Mette based on new CCD observations.


Author(s):  
V. Mizuhira ◽  
Y. Futaesaku

Previously we reported that tannic acid is a very effective fixative for proteins including polypeptides. Especially, in the cross section of microtubules, thirteen submits in A-tubule and eleven in B-tubule could be observed very clearly. An elastic fiber could be demonstrated very clearly, as an electron opaque, homogeneous fiber. However, tannic acid did not penetrate into the deep portion of the tissue-block. So we tried Catechin. This shows almost the same chemical natures as that of proteins, as tannic acid. Moreover, we thought that catechin should have two active-reaction sites, one is phenol,and the other is catechole. Catechole site should react with osmium, to make Os- black. Phenol-site should react with peroxidase existing perhydroxide.


Author(s):  
K. Chien ◽  
R. Van de Velde ◽  
I.P. Shintaku ◽  
A.F. Sassoon

Immunoelectron microscopy of neoplastic lymphoma cells is valuable for precise localization of surface antigens and identification of cell types. We have developed a new approach in which the immunohistochemical staining can be evaluated prior to embedding for EM and desired area subsequently selected for ultrathin sectioning.A freshly prepared lymphoma cell suspension is spun onto polylysine hydrobromide- coated glass slides by cytocentrifugation and immediately fixed without air drying in polylysine paraformaldehyde (PLP) fixative. After rinsing in PBS, slides are stained by a 3-step immunoperoxidase method. Cell monolayer is then fixed in buffered 3% glutaraldehyde prior to DAB reaction. After the DAB reaction step, wet monolayers can be examined under LM for presence of brown reaction product and selected monolayers then processed by routine methods for EM and embedded with the Chien Re-embedding Mold. After the polymerization, the epoxy blocks are easily separated from the glass slides by heatingon a 100°C hot plate for 20 seconds.


Author(s):  
W. A. Chiou ◽  
N. Kohyama ◽  
B. Little ◽  
P. Wagner ◽  
M. Meshii

The corrosion of copper and copper alloys in a marine environment is of great concern because of their widespread use in heat exchangers and steam condensers in which natural seawater is the coolant. It has become increasingly evident that microorganisms play an important role in the corrosion of a number of metals and alloys under a variety of environments. For the past 15 years the use of SEM has proven to be useful in studying biofilms and spatial relationships between bacteria and localized corrosion of metals. Little information, however, has been obtained using TEM capitalizing on its higher spacial resolution and the transmission observation of interfaces. The research presented herein is the first step of this new approach in studying the corrosion with biological influence in pure copper.Commercially produced copper (Cu, 99%) foils of approximately 120 μm thick exposed to a copper-tolerant marine bacterium, Oceanospirillum, and an abiotic culture medium were subsampled (1 cm × 1 cm) for this study along with unexposed control samples.


Author(s):  
Arthur V. Jones

With the introduction of field-emission sources and “immersion-type” objective lenses, the resolution obtainable with modern scanning electron microscopes is approaching that obtainable in STEM and TEM-but only with specific types of specimens. Bulk specimens still suffer from the restrictions imposed by internal scattering and the need to be conducting. Advances in coating techniques have largely overcome these problems but for a sizeable body of specimens, the restrictions imposed by coating are unacceptable.For such specimens, low voltage operation, with its low beam penetration and freedom from charging artifacts, is the method of choice.Unfortunately the technical dificulties in producing an electron beam sufficiently small and of sufficient intensity are considerably greater at low beam energies — so much so that a radical reevaluation of convential design concepts is needed.The probe diameter is usually given by


1968 ◽  
Vol 32 (3) ◽  
pp. 279-282
Author(s):  
JI Mock ◽  
JW Grenfell ◽  
WA Richter
Keyword(s):  

1969 ◽  
Vol 34 (2) ◽  
pp. 176-176

In the November 1968 issue of this journal, Margaret M. Martyn’s name was misspelled Martin on page 315. In the same issue, page 325, column 2 (Jerger, Speaks, and Trammell, “A New Approach to Speech Audiometry”), the sentence reading “Whenever the loss is sloping, however, the PB area underestimates and the SSI area overestimates the amount of handicap” should read as follows: “Whenever the loss is sloping, however, the PB area overestimates and the SSI area underestimates the amount of the handicap.”


Sign in / Sign up

Export Citation Format

Share Document