scholarly journals ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing

Author(s):  
Shirou Maruyama ◽  
Masaya Nakahara ◽  
Naoya Kishiue ◽  
Hiroshi Sakamoto
Keyword(s):  
Author(s):  
Francisco Santoyo ◽  
Edgar Chávez ◽  
Eric S. Téllez
Keyword(s):  

Author(s):  
Hussein Al-Bahadili ◽  
Saif Al-Saab

In this paper, the authors present a description of a new Web search engine model, the compressed index-query (CIQ) Web search engine model. This model incorporates two bit-level compression layers implemented at the back-end processor (server) side, one layer resides after the indexer acting as a second compression layer to generate a double compressed index (index compressor), and the second layer resides after the query parser for query compression (query compressor) to enable bit-level compressed index-query search. The data compression algorithm used in this model is the Hamming codes-based data compression (HCDC) algorithm, which is an asymmetric, lossless, bit-level algorithm permits CIQ search. The different components of the new Web model are implemented in a prototype CIQ test tool (CIQTT), which is used as a test bench to validate the accuracy and integrity of the retrieved data and evaluate the performance of the proposed model. The test results demonstrate that the proposed CIQ model reduces disk space requirements and searching time by more than 24%, and attains a 100% agreement when compared with an uncompressed model.


Author(s):  
Wing-Kai Hon ◽  
Tak-Wah Lam ◽  
Rahul Shah ◽  
Siu-Lung Tam ◽  
Jeffrey Scott Vitter

2016 ◽  
Vol 638 ◽  
pp. 159-170 ◽  
Author(s):  
Joong Chae Na ◽  
Hyunjoon Kim ◽  
Heejin Park ◽  
Thierry Lecroq ◽  
Martine Léonard ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Omar Ahmed ◽  
Massimiliano Rossi ◽  
Sam Kovaka ◽  
Michael Schatz ◽  
Travis Gagie ◽  
...  

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "non-target" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing with the help of efficient pangenome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics (half-maximal exact matches) in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 15 to 4 times smaller than minimap2, respectively. These improvements become even more pronounced with even larger reference databases; SPUMONI's index size scales sublinearly with the number of reference genomes included. This could enable accurate targeted sequencing even in the case where the targeted strains have not necessarily been sequenced or assembled previously. SPUMONI is open source software available from https://github.com/oma219/spumoni.


Author(s):  
Hussein Al-Bahadili ◽  
Saif Al-Saab

In this paper, the authors present a description of a new Web search engine model, the compressed index-query (CIQ) Web search engine model. This model incorporates two bit-level compression layers implemented at the back-end processor (server) side, one layer resides after the indexer acting as a second compression layer to generate a double compressed index (index compressor), and the second layer resides after the query parser for query compression (query compressor) to enable bit-level compressed index-query search. The data compression algorithm used in this model is the Hamming codes-based data compression (HCDC) algorithm, which is an asymmetric, lossless, bit-level algorithm permits CIQ search. The different components of the new Web model are implemented in a prototype CIQ test tool (CIQTT), which is used as a test bench to validate the accuracy and integrity of the retrieved data and evaluate the performance of the proposed model. The test results demonstrate that the proposed CIQ model reduces disk space requirements and searching time by more than 24%, and attains a 100% agreement when compared with an uncompressed model.


2001 ◽  
Vol 135 (1-2) ◽  
pp. 13-28 ◽  
Author(s):  
Paolo Ferragina ◽  
Giovanni Manzini

Sign in / Sign up

Export Citation Format

Share Document