similarity search Latest Research Papers

Most functions of the nervous system depend on neuronal and glial morphology. Continuous advances in microscopic imaging and tracing software have provided an increasingly abundant availability of 3D reconstructions of arborizing dendrites, axons, and processes, allowing their detailed study. However, efficient, large-scale methods to rank neural morphologies by similarity to an archetype are still lacking. Using the NeuroMorpho.Org database, we present a similarity search software enabling fast morphological comparison of hundreds of thousands of neural reconstructions from any species, brain regions, cell types, and preparation protocols. We compared the performance of different morphological measurements: 1) summary morphometrics calculated by L-Measure, 2) persistence vectors, a vectorized descriptor of branching structure, 3) the combination of the two. In all cases, we also investigated the impact of applying dimensionality reduction using principal component analysis (PCA). We assessed qualitative performance by gauging the ability to rank neurons in order of visual similarity. Moreover, we quantified information content by examining explained variance and benchmarked the ability to identify occasional duplicate reconstructions of the same specimen. The results indicate that combining summary morphometrics and persistence vectors with applied PCA provides an information rich characterization that enables efficient and precise comparison of neural morphology. The execution time scaled linearly with data set size, allowing seamless live searching through the entire NeuroMorpho.Org content in fractions of a second. We have deployed the similarity search function as an open-source online software tool both through a user-friendly graphical interface and as an API for programmatic access.

Threat Hunting as a Similarity Search Problem on Multi-positive and Unlabeled Data

10.1109/bigdata52589.2021.9671958 ◽

2021 ◽

Author(s):

Tomas Komarek ◽

Jan Brabec ◽

Cenek Skarda ◽

Petr Somol

Keyword(s):

Similarity Search ◽

Unlabeled Data ◽

Search Problem

Geoscience Language Processing for Exploration

10.2118/207766-ms ◽

2021 ◽

Author(s):

Huseyin Denli ◽

Hassan A Chughtai ◽

Brian Hughes ◽

Robert Gistri ◽

Peng Xu

Keyword(s):

Language Processing ◽

Similarity Search ◽

Question Answering ◽

Language Translation ◽

Automated Analysis ◽

General Purpose ◽

Step Change ◽

Domain Specific ◽

Specific Meaning ◽

Processing Solution

Abstract Deep learning has recently been providing step-change capabilities, particularly using transformer models, for natural language processing applications such as question answering, query-based summarization, and language translation for general-purpose context. We have developed a geoscience-specific language processing solution using such models to enable geoscientists to perform rapid, fully-quantitative and automated analysis of large corpuses of data and gain insights. One of the key transformer-based model is BERT (Bidirectional Encoder Representations from Transformers). It is trained with a large amount of general-purpose text (e.g., Common Crawl). Use of such a model for geoscience applications can face a number of challenges. One is due to the insignificant presence of geoscience-specific vocabulary in general-purpose context (e.g. daily language) and the other one is due to the geoscience jargon (domain-specific meaning of words). For example, salt is more likely to be associated with table salt within a daily language but it is used as a subsurface entity within geosciences. To elevate such challenges, we retrained a pre-trained BERT model with our 20M internal geoscientific records. We will refer the retrained model as GeoBERT. We fine-tuned the GeoBERT model for a number of tasks including geoscience question answering and query-based summarization. BERT models are very large in size. For example, BERT-Large has 340M trained parameters. Geoscience language processing with these models, including GeoBERT, could result in a substantial latency when all database is processed at every call of the model. To address this challenge, we developed a retriever-reader engine consisting of an embedding-based similarity search as a context retrieval step, which helps the solution to narrow the context for a given query before processing the context with GeoBERT. We built a solution integrating context-retrieval and GeoBERT models. Benchmarks show that it is effective to help geologists to identify answers and context for given questions. The prototype will also produce a summary to different granularity for a given set of documents. We have also demonstrated that domain-specific GeoBERT outperforms general-purpose BERT for geoscience applications.

Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

10.26434/chemrxiv-2021-hczsl ◽

2021 ◽

Author(s):

Yuxiang Chen ◽

Chuanlei Liu ◽

Yang An ◽

Yue Lou ◽

Yang Zhao ◽

...

Keyword(s):

Machine Learning ◽

Similarity Search ◽

Data Science ◽

Molecular Similarity ◽

Molecular Design ◽

Supervised Machine Learning ◽

Training Dataset ◽

Methyl Mercaptan ◽

Model Framework ◽

Modeling Framework

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).

CTKG: A Knowledge Graph for Clinical Trials

10.1101/2021.11.04.21265952 ◽

2021 ◽

Author(s):

Ziqi Chen ◽

Bo Peng ◽

Vassilis N. Ioannidis ◽

Mufei Li ◽

George Karypis ◽

...

Keyword(s):

Clinical Trials ◽

Success Rate ◽

Similarity Search ◽

Drug Repurposing ◽

New Drugs ◽

Knowledge Graph ◽

New Treatments

Effective and successful clinical trials are essential in developing new drugs and advancing new treatments. However, clinical trials are very expensive and easy to fail. The high cost and low success rate of clinical trials motivate research on inferring knowledge from existing clinical trials in innovative ways for designing future clinical trials. In this manuscript, we present our efforts on constructing the first publicly available Clinical Trials Knowledge Graph, denoted as CTKG. CTKG includes nodes representing medical entities in clinical trials (e.g., studies, drugs and conditions), and edges representing the relations among these entities (e.g., drugs used in studies). Our embedding analysis demonstrates the potential utilities of CTKG in various applications such as drug repurposing and similarity search, among others.

Prevalence and molecular identification of Mycobacteria isolated from animals slaughtered at Sokoto modern abattoir, Sokoto State, Nigeria

Sokoto Journal of Veterinary Sciences ◽

10.4314/sokjvs.v19i3.7 ◽

2021 ◽

Vol 19 (3) ◽

pp. 217-224

Author(s):

A.I. Musawa ◽

A.A. Magaji ◽

M.D. Salihu ◽

A.C. Kudi ◽

A.U. Junaidu ◽

...

Keyword(s):

Similarity Search ◽

Pcr Amplification ◽

Genomic Region ◽

Specific Primers ◽

Suspected Tuberculosis ◽

Lowenstein Jensen ◽

Hsp65 Gene ◽

Polymerase Chain ◽

Ncbi Blast ◽

Control And Eradication

This study investigated the molecular epidemiology of Mycobacteria isolated from animals slaughtered at Sokoto modern abattoir. During meat inspection, 104 suspected tuberculosis lesions were sampled from a total of 102,681 animals slaughtered between November 2016 and January 2018. These samples were subjected to Ziehl Neelsen staining, followed by culture on Lowenstein-Jensen media. Subsequently, polymerase chain reaction (PCR) and sequencing of the 65KDa heat shock protein (hsp65) gene were performed to identify and phylogenetically characterize the cultured organisms. Because sequencing of the hsp65 gene was unable to distinguish between Mycobacterium bovis (M. bovis) and M. tuberculosis, PCR was performed to amplify a genomic region-specific to M. bovis in order to differentiate them from M. tuberculosis. Results showed that, 14 samples yielded growth after culture. Furthermore, hsp65 was detected in 9 out of the 14 isolates screened, 5 of the amplicons were successfully sequenced. Similarity search using NCBI BLAST tool showed the five sequences to share highest identities with Mycobacterium novocastrense (95.99%), M. canettii (94.54%), and M. tuberculosis/M. bovis (100%). Two out of the 5 isolates were confirmed to be M. bovis after PCR amplification using M. bovis specific primers. Phylogenetic tree further confirmed the identity of these isolates by placing them close to species of their kind. Further studies should be conducted to establish the transmission dynamics of the zoonotic Mycobacteria between animals and their owners, to facilitate control and eradication of tuberculosis.

Why-not questions about spatial temporal top-k trajectory similarity search

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107407 ◽

2021 ◽

Vol 231 ◽

pp. 107407

Author(s):

Changyin Luo ◽

Tangpeng Dan ◽

Yanhong Li ◽

Xiaofeng Meng ◽

Guohui Li

Keyword(s):

Similarity Search ◽

Trajectory Similarity

Picture semantic similarity search based on bipartite network of picture-tag type

PLoS ONE ◽

10.1371/journal.pone.0259028 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259028

Author(s):

Mingxi Zhang ◽

Liuqian Yang ◽

Yipeng Dong ◽

Jinhua Wang ◽

Qinghan Zhang

Keyword(s):

Image Retrieval ◽

Image Classification ◽

Semantic Similarity ◽

Similarity Search ◽

Recommendation System ◽

Search Method ◽

Visual Features ◽

Bipartite Network ◽

Effectiveness And Efficiency ◽

Image Recommendation

Searching similar pictures for a given picture is an important task in numerous applications, including image recommendation system, image classification and image retrieval. Previous studies mainly focused on the similarities of content, which measures similarities based on visual features, such as color and shape, and few of them pay enough attention to semantics. In this paper, we propose a link-based semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network. The picture-tag network is built by “description” relationships between pictures and tags, in which tags and pictures are treated as nodes, and relationships between pictures and tags are regarded as edges. Then we design a TF-IDF-based model to removes the noisy links, so the traverses of these links can be reduced. We observe that “similar pictures contain similar tags, and similar tags describe similar pictures”, which is consistent with the intuition of the SimRank. Consequently, we utilize the SimRank algorithm to compute the similarity scores between pictures. Compared with content-based methods, PictureSim could effectively search similar pictures semantically. Extensive experiments on real datasets to demonstrate the effectiveness and efficiency of the PictureSim.

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper)

10.1109/iccad51958.2021.9643528 ◽

2021 ◽

Author(s):

Hongwu Peng ◽

Shiyang Chen ◽

Zhepeng Wang ◽

Junhuan Yang ◽

Scott A. Weitze ◽

...

Keyword(s):

Similarity Search ◽

Large Scale ◽

Molecular Similarity ◽

Special Session ◽

Session Paper ◽

Accelerator Design

Geographic Information Systems (GIS) as a Tool for Positive Identification from Frontal Sinus Radiographs

Forensic Anthropology ◽

10.5744/fa.2020.0042 ◽

2021 ◽

Author(s):

Jenna Watson

Keyword(s):

Frontal Sinus ◽

Similarity Search ◽

Comparison Method ◽

Error Rates ◽

Sufficient Information ◽

Analysis Tool ◽

Positive Identification ◽

True Match ◽

Area And Perimeter ◽

User Friendly

Frontal sinus radiographs are frequently used to identify human remains. However, the method of visually comparing antemortem (AM) to postmortem (PM) cranial radiographs has been criticized for being a subjective approach that relies on practitioner experience, training, and judgment rather than on objective, quantifiable procedures with published error rates. The objective of this study was to explore the use of ArcMap and its spatial analysis tool, Similarity Search, as a quantifiable, reliable, and reproducible method for identifying frontal sinus matches from cranial radiographs. Using cranial radiographs of 100 individuals from the William M. Bass DonatedSkeletal Collection, the frontal sinuses were digitized to create two-dimensional polygons. Similarity Search was evaluated on its ability to identify the correct AM radiograph using three variables: the number of scallops and the area and perimeter values of the polygons. Using all three variables, Similarity Search correctly identified the true match AM polygon in 58% of the male groups and in 62% of the female groups. These results indicate that ArcMap can be used with frontal sinus radiographs. However, further analysis of the three variables revealed that scallop number did not provide sufficient information about frontal sinus shape to increase the accuracy of Similarity Search, and area and perimeter only captured the size of the frontal sinus polygons, not shape. This research is a first step in developing a user-friendly, quantifiable frontal sinus comparison method for the purpose of positive identification.

similarity search
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Large scale similarity search across digital reconstructions of neural morphology

Threat Hunting as a Similarity Search Problem on Multi-positive and Unlabeled Data

Geoscience Language Processing for Exploration

Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

CTKG: A Knowledge Graph for Clinical Trials

Prevalence and molecular identification of Mycobacteria isolated from animals slaughtered at Sokoto modern abattoir, Sokoto State, Nigeria

Why-not questions about spatial temporal top-k trajectory similarity search

Picture semantic similarity search based on bipartite network of picture-tag type

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper)

Geographic Information Systems (GIS) as a Tool for Positive Identification from Frontal Sinus Radiographs

Export Citation Format

similarity searchRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Large scale similarity search across digital reconstructions of neural morphology

Threat Hunting as a Similarity Search Problem on Multi-positive and Unlabeled Data

Geoscience Language Processing for Exploration

Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

CTKG: A Knowledge Graph for Clinical Trials

Prevalence and molecular identification of Mycobacteria isolated from animals slaughtered at Sokoto modern abattoir, Sokoto State, Nigeria

Why-not questions about spatial temporal top-k trajectory similarity search

Picture semantic similarity search based on bipartite network of picture-tag type

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper)

Geographic Information Systems (GIS) as a Tool for Positive Identification from Frontal Sinus Radiographs

similarity search
Recently Published Documents