similarity search
Recently Published Documents


TOTAL DOCUMENTS

1441
(FIVE YEARS 263)

H-INDEX

52
(FIVE YEARS 9)

2021 ◽  
Author(s):  
Bengt Ljungquist ◽  
Masood A Akram ◽  
Giorgio A Ascoli

Most functions of the nervous system depend on neuronal and glial morphology. Continuous advances in microscopic imaging and tracing software have provided an increasingly abundant availability of 3D reconstructions of arborizing dendrites, axons, and processes, allowing their detailed study. However, efficient, large-scale methods to rank neural morphologies by similarity to an archetype are still lacking. Using the NeuroMorpho.Org database, we present a similarity search software enabling fast morphological comparison of hundreds of thousands of neural reconstructions from any species, brain regions, cell types, and preparation protocols. We compared the performance of different morphological measurements: 1) summary morphometrics calculated by L-Measure, 2) persistence vectors, a vectorized descriptor of branching structure, 3) the combination of the two. In all cases, we also investigated the impact of applying dimensionality reduction using principal component analysis (PCA). We assessed qualitative performance by gauging the ability to rank neurons in order of visual similarity. Moreover, we quantified information content by examining explained variance and benchmarked the ability to identify occasional duplicate reconstructions of the same specimen. The results indicate that combining summary morphometrics and persistence vectors with applied PCA provides an information rich characterization that enables efficient and precise comparison of neural morphology. The execution time scaled linearly with data set size, allowing seamless live searching through the entire NeuroMorpho.Org content in fractions of a second. We have deployed the similarity search function as an open-source online software tool both through a user-friendly graphical interface and as an API for programmatic access.


Author(s):  
Tomas Komarek ◽  
Jan Brabec ◽  
Cenek Skarda ◽  
Petr Somol

2021 ◽  
Author(s):  
Huseyin Denli ◽  
Hassan A Chughtai ◽  
Brian Hughes ◽  
Robert Gistri ◽  
Peng Xu

Abstract Deep learning has recently been providing step-change capabilities, particularly using transformer models, for natural language processing applications such as question answering, query-based summarization, and language translation for general-purpose context. We have developed a geoscience-specific language processing solution using such models to enable geoscientists to perform rapid, fully-quantitative and automated analysis of large corpuses of data and gain insights. One of the key transformer-based model is BERT (Bidirectional Encoder Representations from Transformers). It is trained with a large amount of general-purpose text (e.g., Common Crawl). Use of such a model for geoscience applications can face a number of challenges. One is due to the insignificant presence of geoscience-specific vocabulary in general-purpose context (e.g. daily language) and the other one is due to the geoscience jargon (domain-specific meaning of words). For example, salt is more likely to be associated with table salt within a daily language but it is used as a subsurface entity within geosciences. To elevate such challenges, we retrained a pre-trained BERT model with our 20M internal geoscientific records. We will refer the retrained model as GeoBERT. We fine-tuned the GeoBERT model for a number of tasks including geoscience question answering and query-based summarization. BERT models are very large in size. For example, BERT-Large has 340M trained parameters. Geoscience language processing with these models, including GeoBERT, could result in a substantial latency when all database is processed at every call of the model. To address this challenge, we developed a retriever-reader engine consisting of an embedding-based similarity search as a context retrieval step, which helps the solution to narrow the context for a given query before processing the context with GeoBERT. We built a solution integrating context-retrieval and GeoBERT models. Benchmarks show that it is effective to help geologists to identify answers and context for given questions. The prototype will also produce a summary to different granularity for a given set of documents. We have also demonstrated that domain-specific GeoBERT outperforms general-purpose BERT for geoscience applications.


2021 ◽  
Author(s):  
Yuxiang Chen ◽  
Chuanlei Liu ◽  
Yang An ◽  
Yue Lou ◽  
Yang Zhao ◽  
...  

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).


2021 ◽  
Author(s):  
Ziqi Chen ◽  
Bo Peng ◽  
Vassilis N. Ioannidis ◽  
Mufei Li ◽  
George Karypis ◽  
...  

Effective and successful clinical trials are essential in developing new drugs and advancing new treatments. However, clinical trials are very expensive and easy to fail. The high cost and low success rate of clinical trials motivate research on inferring knowledge from existing clinical trials in innovative ways for designing future clinical trials. In this manuscript, we present our efforts on constructing the first publicly available Clinical Trials Knowledge Graph, denoted as CTKG. CTKG includes nodes representing medical entities in clinical trials (e.g., studies, drugs and conditions), and edges representing the relations among these entities (e.g., drugs used in studies). Our embedding analysis demonstrates the potential utilities of CTKG in various applications such as drug repurposing and similarity search, among others.


2021 ◽  
Vol 19 (3) ◽  
pp. 217-224
Author(s):  
A.I. Musawa ◽  
A.A. Magaji ◽  
M.D. Salihu ◽  
A.C. Kudi ◽  
A.U. Junaidu ◽  
...  

This study investigated the molecular epidemiology of Mycobacteria isolated from animals slaughtered at Sokoto modern abattoir. During meat inspection, 104 suspected tuberculosis lesions were sampled from a total of 102,681 animals slaughtered between November 2016 and January 2018. These samples were subjected to Ziehl Neelsen staining, followed by culture on Lowenstein-Jensen media. Subsequently, polymerase chain reaction (PCR) and sequencing of the 65KDa heat shock protein (hsp65) gene were performed to identify and phylogenetically characterize the cultured organisms. Because sequencing of the hsp65 gene was unable to distinguish between Mycobacterium bovis (M. bovis) and M. tuberculosis, PCR was performed to amplify a genomic region-specific to M. bovis in order to differentiate them from M. tuberculosis. Results showed that, 14 samples yielded growth after culture. Furthermore, hsp65 was detected in 9 out of the 14 isolates screened, 5 of the amplicons were successfully sequenced. Similarity search using NCBI BLAST tool showed the five sequences to share highest identities with Mycobacterium novocastrense (95.99%), M. canettii (94.54%), and M. tuberculosis/M. bovis (100%). Two out of the 5 isolates were confirmed to be M. bovis after PCR amplification using M. bovis specific primers. Phylogenetic tree further confirmed the identity of these isolates by placing them close to species of their kind. Further studies should be conducted to establish the transmission dynamics of the zoonotic Mycobacteria between animals and their owners, to facilitate control and eradication of tuberculosis.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259028
Author(s):  
Mingxi Zhang ◽  
Liuqian Yang ◽  
Yipeng Dong ◽  
Jinhua Wang ◽  
Qinghan Zhang

Searching similar pictures for a given picture is an important task in numerous applications, including image recommendation system, image classification and image retrieval. Previous studies mainly focused on the similarities of content, which measures similarities based on visual features, such as color and shape, and few of them pay enough attention to semantics. In this paper, we propose a link-based semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network. The picture-tag network is built by “description” relationships between pictures and tags, in which tags and pictures are treated as nodes, and relationships between pictures and tags are regarded as edges. Then we design a TF-IDF-based model to removes the noisy links, so the traverses of these links can be reduced. We observe that “similar pictures contain similar tags, and similar tags describe similar pictures”, which is consistent with the intuition of the SimRank. Consequently, we utilize the SimRank algorithm to compute the similarity scores between pictures. Compared with content-based methods, PictureSim could effectively search similar pictures semantically. Extensive experiments on real datasets to demonstrate the effectiveness and efficiency of the PictureSim.


2021 ◽  
Vol 231 ◽  
pp. 107407
Author(s):  
Changyin Luo ◽  
Tangpeng Dan ◽  
Yanhong Li ◽  
Xiaofeng Meng ◽  
Guohui Li

2021 ◽  
Author(s):  
Hongwu Peng ◽  
Shiyang Chen ◽  
Zhepeng Wang ◽  
Junhuan Yang ◽  
Scott A. Weitze ◽  
...  

2021 ◽  
Author(s):  
Jenna Watson

Frontal sinus radiographs are frequently used to identify human remains. However, the method of visually comparing antemortem (AM) to postmortem (PM) cranial radiographs has been criticized for being a subjective approach that relies on practitioner experience, training, and judgment rather than on objective, quantifiable procedures with published error rates. The objective of this study was to explore the use of ArcMap and its spatial analysis tool, Similarity Search, as a quantifiable, reliable, and reproducible method for identifying frontal sinus matches from cranial radiographs. Using cranial radiographs of 100 individuals from the William M. Bass DonatedSkeletal Collection, the frontal sinuses were digitized to create two-dimensional polygons. Similarity Search was evaluated on its ability to identify the correct AM radiograph using three variables: the number of scallops and the area and perimeter values of the polygons. Using all three variables, Similarity Search correctly identified the true match AM polygon in 58% of the male groups and in 62% of the female groups. These results indicate that ArcMap can be used with frontal sinus radiographs. However, further analysis of the three variables revealed that scallop number did not provide sufficient information about frontal sinus shape to increase the accuracy of Similarity Search, and area and perimeter only captured the size of the frontal sinus polygons, not shape. This research is a first step in developing a user-friendly, quantifiable frontal sinus comparison method for the purpose of positive identification.


Sign in / Sign up

Export Citation Format

Share Document