Algorithmic Exploration of Axiom Spaces for Efficient Similarity Search at Large Scale

Most functions of the nervous system depend on neuronal and glial morphology. Continuous advances in microscopic imaging and tracing software have provided an increasingly abundant availability of 3D reconstructions of arborizing dendrites, axons, and processes, allowing their detailed study. However, efficient, large-scale methods to rank neural morphologies by similarity to an archetype are still lacking. Using the NeuroMorpho.Org database, we present a similarity search software enabling fast morphological comparison of hundreds of thousands of neural reconstructions from any species, brain regions, cell types, and preparation protocols. We compared the performance of different morphological measurements: 1) summary morphometrics calculated by L-Measure, 2) persistence vectors, a vectorized descriptor of branching structure, 3) the combination of the two. In all cases, we also investigated the impact of applying dimensionality reduction using principal component analysis (PCA). We assessed qualitative performance by gauging the ability to rank neurons in order of visual similarity. Moreover, we quantified information content by examining explained variance and benchmarked the ability to identify occasional duplicate reconstructions of the same specimen. The results indicate that combining summary morphometrics and persistence vectors with applied PCA provides an information rich characterization that enables efficient and precise comparison of neural morphology. The execution time scaled linearly with data set size, allowing seamless live searching through the entire NeuroMorpho.Org content in fractions of a second. We have deployed the similarity search function as an open-source online software tool both through a user-friendly graphical interface and as an API for programmatic access.

Download Full-text

On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-75193-1_54 ◽

2018 ◽

pp. 450-457

Author(s):

Alexander Ocsa ◽

Jose Luis Huillca ◽

Cristian Lopez del Alamo

Keyword(s):

Similarity Search ◽

Large Scale ◽

Approximate Similarity Search ◽

Approximate Similarity

Download Full-text

Descriptor Fingerprints and Their Application to WhiteWine Clustering and Discrimination.

Acta Scientifica Naturalis ◽

10.2478/asn-2018-0004 ◽

2018 ◽

Vol 5 (1) ◽

pp. 24-34

Author(s):

I. P. Bangov ◽

M. Moskovkina ◽

B. P. Stojanov

Keyword(s):

Similarity Search ◽

Large Scale ◽

Analytical Data ◽

Analytical Laboratory ◽

Statistical Process ◽

White Wines ◽

Laboratory Parameters ◽

Individual Laboratory ◽

Individual Cluster

Abstract This study continues the attempt to use the statistical process for a large-scale analytical data. A group of 3898 white wines, each with 11 analytical laboratory benchmarks was analyzed by a fingerprint similarity search in order to be grouped into separate clusters. A characterization of the wine’s quality in each individual cluster was carried out according to individual laboratory parameters.

Download Full-text

Tree quantization for large-scale similarity search and classification

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2015.7299052 ◽

2015 ◽

Cited By ~ 43

Author(s):

Artem Babenko ◽

Victor Lempitsky

Keyword(s):

Similarity Search ◽

Large Scale ◽

Scale Similarity

Download Full-text

Weighted hashing for fast large scale similarity search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13 ◽

10.1145/2505515.2507851 ◽

2013 ◽

Cited By ~ 9

Author(s):

Qifan Wang ◽

Dan Zhang ◽

Luo Si

Keyword(s):

Similarity Search ◽

Large Scale ◽

Scale Similarity

Download Full-text

Sparse Semantic Hashing for Efficient Large Scale Similarity Search

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14 ◽

10.1145/2661829.2662145 ◽

2014 ◽

Cited By ~ 1

Author(s):

Qifan Wang ◽

Bin Shen ◽

Zhiwei Zhang ◽

Luo Si

Keyword(s):

Similarity Search ◽

Large Scale ◽

Scale Similarity

Download Full-text

A neural algorithm for a fundamental computing problem

10.1101/180471 ◽

2017 ◽

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Similarity Search ◽

Large Scale ◽

Activity Patterns ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Information Retrieval Systems ◽

Novel Variant ◽

Benchmark Datasets ◽

Similar Images ◽

Traditional Approaches

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly’s olfac-tory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly’s circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behav-iors learned from one odor can be applied when a similar odor is experienced. The fly’s algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to tra-ditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text

Finding human gene-disease associations using a Network Enhanced Similarity Search (NESS) of multi-species heterogeneous functional genomics data

10.1101/2020.03.11.987552 ◽

2020 ◽

Cited By ~ 1

Author(s):

Timothy Reynolds ◽

Jason A. Bubier ◽

Michael A. Langston ◽

Elissa J. Chesler ◽

Erich J. Baker

Keyword(s):

Substance Use ◽

Functional Genomics ◽

Substance Use Disorders ◽

Similarity Search ◽

Large Scale ◽

Biological Pathways ◽

Gene Annotations ◽

Link Type ◽

Disease Associations ◽

Biological Entities

AbstractDisease diagnosis and treatment is challenging in part due to the misalignment of diagnostic categories with the underlying biology of disease. The evaluation of large-scale genomic experimental datasets is a compelling approach to refining the classification of biological concepts, such as disease. Well-established approaches, some of which rely on information theory or network analysis, quantitatively assess relationships among biological entities using gene annotations, structured vocabularies, and curated data sources. However, the gene annotations used in these evaluations are often sparse, potentially biased due to uneven study and representation in the literature, and constrained to the single species from which they were derived. In order to overcome these deficiencies inherent in the structure and sparsity of these annotated datasets, we developed a novel Network Enhanced Similarity Search (NESS) tool which takes advantage of multi-species networks of heterogeneous data to bridge sparsely populated datasets.NESS employs a random walk with restart algorithm across harmonized multi-species data, effectively compensating for sparsely populated and noisy genomic studies. We further demonstrate that it is highly resistant to spurious or sparse datasets and generates significantly better recapitulation of ground truth biological pathways than other similarity metrics alone. Furthermore, since NESS has been deployed as an embedded tool in the GeneWeaver environment, it can rapidly take advantage of curated multi-species networks to provide informative assertions of relatedness of any pair of biological entities or concepts, e.g., gene-gene, gene-disease, or phenotype-disease associations. NESS ultimately enables multi-species analysis applications to leverage model organism data to overcome the challenge of data sparsity in the study of human disease.Availability and ImplementationImplementation available at https://geneweaver.org/ness. Source code freely available at https://github.com/treynr/ness.Author summaryFinding consensus among large-scale genomic datasets is an ongoing challenge in the biomedical sciences. Harmonizing and analyzing such data is important because it allows researchers to mitigate the idiosyncrasies of experimental systems, alleviate study biases, and augment sparse datasets. Additionally, it allows researchers to utilize animal model studies and cross-species experiments to better understand biological function in health and disease. Here we provide a tool for integrating and analyzing heterogeneous functional genomics data using a graph-based model. We show how this type of analysis can be used to identify similar relationships among biological entities such as genes, processes, and disease through shared genomic associations. Our results indicate this approach is effective at reducing biases caused by sparse and noisy datasets. We show how this type of analysis can be used to aid the classification gene function and prioritization of genes involved in substance use disorders. In addition, our analysis reveals genes and biological pathways with shared association to multiple, co-occurring substance use disorders.

Download Full-text