indexing structure
Recently Published Documents


TOTAL DOCUMENTS

137
(FIVE YEARS 23)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Vol 11 (20) ◽  
pp. 9581
Author(s):  
Wei Wang ◽  
Yi Zhang ◽  
Genyu Ge ◽  
Qin Jiang ◽  
Yang Wang ◽  
...  

The spatial index structure is one of the most important research topics for organizing and managing massive 3D Point Cloud. As a point in Point Cloud consists of Cartesian coordinates (x,y,z), the common method to explore geometric information and features is nearest neighbor searching. An efficient spatial indexing structure directly affects the speed of the nearest neighbor search. Octree and kd-tree are the most used for Point Cloud data. However, Octree or KD-tree do not perform best in nearest neighbor searching. A highly balanced tree, 3D R*-tree is considered the most effective method so far. So, a hybrid spatial indexing structure is proposed based on Octree and 3D R*-tree. In this paper, we discussed how thresholds influence the performance of nearest neighbor searching and constructing the tree. Finally, an adaptive way method adopted to set thresholds. Furthermore, we obtained a better performance in tree construction and nearest neighbor searching than Octree and 3D R*-tree.


2021 ◽  
Author(s):  
Stiw Herrera ◽  
Larissa Miguez da Silva ◽  
Paulo Ricardo Reis ◽  
Anderson Silva ◽  
Fabio Porto

Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.


2021 ◽  
Vol 14 (13) ◽  
pp. 3253-3266
Author(s):  
Jian Liu ◽  
Kefei Wang ◽  
Feng Chen

Time-series databases are becoming an indispensable component in today's data centers. In order to manage the rapidly growing time-series data, we need an effective and efficient system solution to handle the huge traffic of time-series data queries. A promising solution is to deploy a high-speed, large-capacity cache system to relieve the burden on the backend time-series databases and accelerate query processing. However, time-series data is drastically different from other traditional data workloads, bringing both challenges and opportunities. In this paper, we present a flash-based cache system design for time-series data, called TSCache . By exploiting the unique properties of time-series data, we have developed a set of optimization schemes, such as a slab-based data management, a two-layered data indexing structure, an adaptive time-aware caching policy, and a low-cost compaction process. We have implemented a prototype based on Twitter's Fatcache. Our experimental results show that TSCache can significantly improve client query performance, effectively increasing the bandwidth by a factor of up to 6.7 and reducing the latency by up to 84.2%.


2021 ◽  
Author(s):  
Wei Wang ◽  
Yi Zhang ◽  
Genyu Ge ◽  
Qin Jiang ◽  
Yang Wang ◽  
...  
Keyword(s):  

2021 ◽  
Vol 14 (11) ◽  
pp. 2073-2086
Author(s):  
Yifan Li ◽  
Xiaohui Yu ◽  
Nick Koudas

Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs or indexes that produce large candidate sets. In this paper, we design a learning-based exact set similarity search approach, LES 3 . Our approach first partitions sets into groups, and then utilizes a light-weight bitmap-like indexing structure, called token-group matrix (TGM), to organize groups and prune out candidates given a query set. In order to optimize pruning using the TGM, we analytically investigate the optimal partitioning strategy under certain distributional assumptions. Using these results, we then design a learning-based partitioning approach called L2P and an associated data representation encoding, PTR, to identify the partitions. We conduct extensive experiments on real and synthetic datasets to fully study LES 3 , establishing the effectiveness and superiority over other applicable approaches.


2021 ◽  
Vol 15 (4) ◽  
Author(s):  
Shanshan Chen ◽  
Guiping Zhou ◽  
Xingdi An

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicola Licheri ◽  
Vincenzo Bonnici ◽  
Marco Beccuti ◽  
Rosalba Giugno

Abstract Background Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.


Author(s):  
Manar A. Elmeiligy ◽  
Ali I. El Desouky ◽  
Sally M. Elghamrawy
Keyword(s):  
Big Data ◽  

Author(s):  
Zhonghua Wang ◽  
Ting Yao ◽  
Jiguang Wan ◽  
Hong Jiang ◽  
Cui Qiu ◽  
...  

2020 ◽  
Vol 12 (10) ◽  
pp. 168781402097031
Author(s):  
Shedong Ren ◽  
Fangzhi Gui ◽  
Yanwei Zhao ◽  
Min Zhan ◽  
Wanliang Wang

In the initial stage of low-carbon product design, design information is always uncertain and incomplete, as well as the coupling properties between design attributes, thus it requires retrospective coordination for design conflicts resulting from the inclusion of low-carbon requirements. Reusing the prior design knowledge can promote design efficiency, however, the acquisition of similar cases knowledge not only needs to consider the similarity of design problems, but also the adaptability of candidate cases. This study presents an effective similarity determination model to support low-carbon product design, and targets of the proposed model are (1) to reasonably determine design ranges of attribute values for product cases retrieval by representing the uncertain design attributes with fuzzy set theory; (2) to construct an efficient indexing structure to generate the index set of similar cases based on the improved discretized highest similarity method by proposing two effective strategies; and, (3) to establish similarity estimation models for different types of attributes, and it calculates the information content of each attribute to evaluate the adaptability of cases based on the Information Axiom. The applicability of the proposed model is demonstrated through a case study of similar cases retrieval for the vacuum pump low-carbon design.


Sign in / Sign up

Export Citation Format

Share Document