HD-Tree: An Efficient High-Dimensional Virtual Index Structure Using a Half Decomposition Strategy

Searching on the web is one of the most progressive and expanding field nowadays. A large amount of information is available on the World Wide Web, motivating the need of efficient text indexing method that support fast text retrieval. In the past, two main indexing techniques: Signature files and Inverted files have been proposed. First require much larger space to store index and are more expensive to construct and update than inverted files. Second has been efficiently implemented using different structures like Sorted array and B-Tree. Sorted array was very expensive in updating the indices while appending a new keyword and B-tree method breaks down if there are many words with the same prefix. This paper presents a modified index structure for text retrieval that keeps a good result to optimize the space needed to store and time to search document. The proposed index is designed using the Wavelet Tree (WT), which was originally designed as wavelet transform for images. Experimental results show that on increasing the query length, the WT based index performs better than others.

Download Full-text

A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases

10.32920/14638896.v1 ◽

2021 ◽

Author(s):

Shengnan Ke ◽

Jun Gong ◽

Songnian Li ◽

Qing Zhu ◽

Xintao Liu ◽

...

Keyword(s):

Hash Table ◽

Index Structure ◽

Hybrid Index ◽

Temporal Data ◽

Trajectory Data ◽

Generation Efficiency ◽

Tree Structures ◽

Data Indexing ◽

Indexing Method ◽

Spatio Temporal

In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type.

Download Full-text

Vibration-Based Outlier Detection on High Dimensional Data

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500135 ◽

2016 ◽

Vol 25 (03) ◽

pp. 1650013

Author(s):

Shuyin Xia ◽

Guoyin Wang ◽

Hong Yu ◽

Qun Liu ◽

Jin Wang

Keyword(s):

Outlier Detection ◽

Time Complexity ◽

State Of The Art ◽

High Dimensional Data ◽

Difficult Problem ◽

Index Structure ◽

High Dimensional ◽

Basic Model ◽

Traditional Approaches ◽

Better Than

Outlier detection is a difficult problem due to its time complexity being quadratic or cube in most cases, which makes it necessary to develop corresponding acceleration algorithms. Since the index structure (c.f. R tree) is used in the main acceleration algorithms, those approaches deteriorate when the dimensionality increases. In this paper, an approach named VBOD (vibration-based outlier detection) is proposed, in which the main variants assess the vibration. Since the basic model and approximation algorithm FASTVBOD do not need to compute the index structure, their performances are less sensitive to increasing dimensions than traditional approaches. The basic model of this approach has only quadratic time complexity. Furthermore, accelerated algorithms decrease time complexity to [Formula: see text]. The fact that this approach does not rely on any parameter selection is another advantage. FASTVBOD was compared with other state-of-the-art algorithms, and it performed much better than other methods especially on high dimensional data.

Download Full-text

SR-tree: An index structure for nearest-neighbor searching of high-dimensional point data

Systems and Computers in Japan ◽

10.1002/(sici)1520-684x(19980615)29:6<59::aid-scj6>3.0.co;2-k ◽

1998 ◽

Vol 29 (6) ◽

pp. 59-73 ◽

Cited By ~ 1

Author(s):

Norio Katayama ◽

Shin'ichi Satoh

Keyword(s):

Nearest Neighbor ◽

Index Structure ◽

High Dimensional ◽

Nearest Neighbor Searching ◽

Point Data

Download Full-text

A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases

10.32920/14638896 ◽

2021 ◽

Author(s):

Shengnan Ke ◽

Jun Gong ◽

Songnian Li ◽

Qing Zhu ◽

Xintao Liu ◽

...

Keyword(s):

Hash Table ◽

Index Structure ◽

Hybrid Index ◽

Temporal Data ◽

Trajectory Data ◽

Generation Efficiency ◽

Tree Structures ◽

Data Indexing ◽

Indexing Method ◽

Spatio Temporal

In recent years, there has been tremendous growth in the field of indoor and outdoor positioning sensors continuously producing huge volumes of trajectory data that has been used in many fields such as location-based services or location intelligence. Trajectory data is massively increased and semantically complicated, which poses a great challenge on spatio-temporal data indexing. This paper proposes a spatio-temporal data indexing method, named HBSTR-tree, which is a hybrid index structure comprising spatio-temporal R-tree, B*-tree and Hash table. To improve the index generation efficiency, rather than directly inserting trajectory points, we group consecutive trajectory points as nodes according to their spatio-temporal semantics and then insert them into spatio-temporal R-tree as leaf nodes. Hash table is used to manage the latest leaf nodes to reduce the frequency of insertion. A new spatio-temporal interval criterion and a new node-choosing sub-algorithm are also proposed to optimize spatio-temporal R-tree structures. In addition, a B*-tree sub-index of leaf nodes is built to query the trajectories of targeted objects efficiently. Furthermore, a database storage scheme based on a NoSQL-type DBMS is also proposed for the purpose of cloud storage. Experimental results prove that HBSTR-tree outperforms TB*-tree in some aspects such as generation efficiency, query performance and query type.

Download Full-text

PK-Tree: A Spatial Index Structure for High Dimensional Point Data

Information Organization and Databases ◽

10.1007/978-1-4615-1379-7_20 ◽

2000 ◽

pp. 281-293 ◽

Cited By ~ 2

Author(s):

Wei Wang ◽

Jiong Yang ◽

Richard Muntz

Keyword(s):

Spatial Index ◽

Index Structure ◽

High Dimensional ◽

Point Data

Download Full-text

An efficient bitmap indexing method for similarity search in high dimensional multimedia databases

2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763) ◽

10.1109/icme.2004.1394325 ◽

2005 ◽

Cited By ~ 1

Author(s):

Jinguk Jeong ◽

Jongho Nang

Keyword(s):

Similarity Search ◽

Multimedia Databases ◽

High Dimensional ◽

Indexing Method

Download Full-text

Improved Trust Region Based MPS Method for High-Dimensional Expensive Black-Box Problems

Volume 3B: 39th Design Automation Conference ◽

10.1115/detc2013-12665 ◽

2013 ◽

Author(s):

George H. Cheng ◽

Adel Younis ◽

Kambiz Haji Hajikolaei ◽

G. Gary Wang

Keyword(s):

Optimization Problems ◽

Trust Region ◽

Black Box ◽

High Dimensional ◽

Test Problems ◽

Design Variables ◽

Low Dimensionality ◽

Improve Algorithm ◽

Improved Performance ◽

Better Than

Mode Pursuing Sampling (MPS) was developed as a global optimization algorithm for optimization problems involving expensive black box functions. MPS has been found to be effective and efficient for problems of low dimensionality, i.e., the number of design variables is less than ten. A previous conference publication integrated the concept of trust regions into the MPS framework to create a new algorithm, TRMPS, which dramatically improved performance and efficiency for high dimensional problems. However, although TRMPS performed better than MPS, it was unproven against other established algorithms such as GA. This paper introduces an improved algorithm, TRMPS2, which incorporates guided sampling and low function value criterion to further improve algorithm performance for high dimensional problems. TRMPS2 is benchmarked against MPS and GA using a suite of test problems. The results show that TRMPS2 performs better than MPS and GA on average for high dimensional, expensive, and black box (HEB) problems.

Download Full-text