Similarity Search in Large-Scale Graph Databases

2017 ◽  
pp. 507-529 ◽  
Author(s):  
Peixiang Zhao
2019 ◽  
Vol 1 (4) ◽  
pp. 333-349 ◽  
Author(s):  
Peilu Wang ◽  
Hao Jiang ◽  
Jingfang Xu ◽  
Qi Zhang

Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialog system. These applications have been used in Web search products to help user acquire information more efficiently.


2021 ◽  
Author(s):  
Bengt Ljungquist ◽  
Masood A Akram ◽  
Giorgio A Ascoli

Most functions of the nervous system depend on neuronal and glial morphology. Continuous advances in microscopic imaging and tracing software have provided an increasingly abundant availability of 3D reconstructions of arborizing dendrites, axons, and processes, allowing their detailed study. However, efficient, large-scale methods to rank neural morphologies by similarity to an archetype are still lacking. Using the NeuroMorpho.Org database, we present a similarity search software enabling fast morphological comparison of hundreds of thousands of neural reconstructions from any species, brain regions, cell types, and preparation protocols. We compared the performance of different morphological measurements: 1) summary morphometrics calculated by L-Measure, 2) persistence vectors, a vectorized descriptor of branching structure, 3) the combination of the two. In all cases, we also investigated the impact of applying dimensionality reduction using principal component analysis (PCA). We assessed qualitative performance by gauging the ability to rank neurons in order of visual similarity. Moreover, we quantified information content by examining explained variance and benchmarked the ability to identify occasional duplicate reconstructions of the same specimen. The results indicate that combining summary morphometrics and persistence vectors with applied PCA provides an information rich characterization that enables efficient and precise comparison of neural morphology. The execution time scaled linearly with data set size, allowing seamless live searching through the entire NeuroMorpho.Org content in fractions of a second. We have deployed the similarity search function as an open-source online software tool both through a user-friendly graphical interface and as an API for programmatic access.


2018 ◽  
Vol 5 (1) ◽  
pp. 24-34
Author(s):  
I. P. Bangov ◽  
M. Moskovkina ◽  
B. P. Stojanov

Abstract This study continues the attempt to use the statistical process for a large-scale analytical data. A group of 3898 white wines, each with 11 analytical laboratory benchmarks was analyzed by a fingerprint similarity search in order to be grouped into separate clusters. A characterization of the wine’s quality in each individual cluster was carried out according to individual laboratory parameters.


2021 ◽  
Vol 11 (17) ◽  
pp. 7782
Author(s):  
Itziar Urbieta ◽  
Marcos Nieto ◽  
Mikel García ◽  
Oihana Otaegui

Modern Artificial Intelligence (AI) methods can produce a large quantity of accurate and richly described data, in domains such as surveillance or automation. As a result, the need to organize data at a large scale in a semantic structure has arisen for long-term data maintenance and consumption. Ontologies and graph databases have gained popularity as mechanisms to satisfy this need. Ontologies provide the means to formally structure descriptive and semantic relations of a domain. Graph databases allow efficient and well-adapted storage, manipulation, and consumption of these linked data resources. However, at present, there is no a universally defined strategy for building AI-oriented ontologies for the automotive sector. One of the key challenges is the lack of a global standardized vocabulary. Most private initiatives and large open datasets for Advanced Driver Assistance Systems (ADASs) and Autonomous Driving (AD) development include their own definitions of terms, with incompatible taxonomies and structures, thus resulting in a well-known lack of interoperability. This paper presents the Automotive Global Ontology (AGO) as a Knowledge Organization System (KOS) using a graph database (Neo4j). Two different use cases for the AGO domain ontology are presented to showcase its capabilities in terms of semantic labeling and scenario-based testing. The ontology and related material have been made public for their subsequent use by the industry and academic communities.


2014 ◽  
Vol 24 (2) ◽  
pp. 271-296 ◽  
Author(s):  
Ye Yuan ◽  
Guoren Wang ◽  
Lei Chen ◽  
Haixun Wang

Sign in / Sign up

Export Citation Format

Share Document