Bloom Filter Tree for Fast Search and Synchronization of Tree-Structured Data

AbstractMotivationMapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. > 10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about one day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times.ResultsTo solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor directories via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM [email protected]://gitlab.com/pirovc/dream_yara/

Download Full-text

Fast search algorithm for VQ-based recognition of isolated words

IEE Proceedings I Communications Speech and Vision ◽

10.1049/ip-i-2.1989.0059 ◽

1989 ◽

Vol 136 (6) ◽

pp. 391 ◽

Cited By ~ 15

Author(s):

S.H. Chen ◽

J.S. Pan

Keyword(s):

Search Algorithm ◽

Fast Search ◽

Fast Search Algorithm

Download Full-text

A Model for Structured Data Entry Based on Explicit Descriptional Knowledge

Methods of Information in Medicine ◽

10.1055/s-0038-1635050 ◽

1994 ◽

Vol 33 (05) ◽

pp. 454-463 ◽

Cited By ~ 22

Author(s):

A. M. van Ginneken ◽

J. van der Lei ◽

J. H. van Bemmel ◽

P. W. Moorman

Keyword(s):

Domain Knowledge ◽

Data Entry ◽

Knowledge Bases ◽

Structured Data ◽

Research Quality ◽

Free Text ◽

Specific Knowledge ◽

User Input ◽

The One ◽

Research Quality Assessment

Abstract:Clinical narratives in patient records are usually recorded in free text, limiting the use of this information for research, quality assessment, and decision support. This study focuses on the capture of clinical narratives in a structured format by supporting physicians with structured data entry (SDE). We analyzed and made explicit which requirements SDE should meet to be acceptable for the physician on the one hand, and generate unambiguous patient data on the other. Starting from these requirements, we found that in order to support SDE, the knowledge on which it is based needs to be made explicit: we refer to this knowledge as descriptional knowledge. We articulate the nature of this knowledge, and propose a model in which it can be formally represented. The model allows the construction of specific knowledge bases, each representing the knowledge needed to support SDE within a circumscribed domain. Data entry is made possible through a general entry program, of which the behavior is determined by a combination of user input and the content of the applicable domain knowledge base. We clarify how descriptional knowledge is represented, modeled, and used for data entry to achieve SDE, which meets the proposed requirements.

Download Full-text

Structured Reporting of Medical Findings: Evaluation of a System in Gastroenterology

Methods of Information in Medicine ◽

10.1055/s-0038-1634885 ◽

1992 ◽

Vol 31 (04) ◽

pp. 268-274 ◽

Cited By ~ 24

Author(s):

W. Gaus ◽

J. G. Wechsler ◽

P. Janowitz ◽

J. Tudyka ◽

W. Kratzer ◽

...

Keyword(s):

Structured Data ◽

Structured Reporting ◽

Free Text ◽

Clinical Documentation ◽

Abdominal Sonography ◽

Technical Examination ◽

Medical Reports ◽

Time Required ◽

Upper Abdominal ◽

Structured Approach

Abstract:A system using structured reporting of findings was developed for the preparation of medical reports and for clinical documentation purposes in upper abdominal sonography, and evaluated in the course of routine use. The evaluation focussed on the following parameters: completeness and correctness of the entered data, the proportion of free text, the validity and objectivity of the documentation, user acceptance, and time required. The completeness in the case of two clinically relevant parameters could be compared with an already existing database containing freely dictated reports. The results confirmed the hypothesis that, for the description of results of a technical examination, structured data reporting is a viable alternative to free-text dictation. For the application evaluated, there is even evidence of the superiority of a structured approach. The system can be put to use in related areas of application.

Download Full-text

Structured Data Entry for Reliable Acquisition of Pharmacokinetic Data

Methods of Information in Medicine ◽

10.1055/s-0038-1634673 ◽

1996 ◽

Vol 35 (03) ◽

pp. 261-264 ◽

Cited By ~ 3

Author(s):

T. Schromm ◽

T. Frankewitsch ◽

M. Giehl ◽

F. Keller ◽

D. Zellner

Keyword(s):

Text Processing ◽

Data Entry ◽

Processing System ◽

Database System ◽

Structured Data ◽

Pharmacokinetic Parameters ◽

Pharmacokinetic Data ◽

Probability Of Error ◽

Error Frequency ◽

Estimated Error

Abstract:A pharmacokinetic database was constructed that is as free of errors as possible. Pharmacokinetic parameters were derived from the literature using a text-processing system and a database system. A random data sample from each system was compared with the original literature. The estimated error frequencies using statistical methods differed significantly between the two systems. The estimated error frequency in the text-processing system was 7.2%, that in the database system 2.7%. Compared with the original values in the literature, the estimated probability of error for identical pharmacokinetic parameters recorded in both systems is 2.4% and is not significantly different from the error frequency in the database. Parallel data entry with a text-processing system and a database system is, therefore, not significantly better than structured data entry for reducing the error frequency.

Download Full-text

Structured data assessment of hypopituitarism in patients with traumatic brain injury (TBI) or aneurysmal subarachnoid hemorrhage (SAH) – a study concept

Experimental and Clinical Endocrinology & Diabetes ◽

10.1055/s-2005-920452 ◽

2005 ◽

Vol 113 (08) ◽

Author(s):

HJ Schneider ◽

I Kreitschmann-Andermahr ◽

M Schneider ◽

M Jordan ◽

B Saller ◽

...

Keyword(s):

Traumatic Brain Injury ◽

Subarachnoid Hemorrhage ◽

Brain Injury ◽

Aneurysmal Subarachnoid Hemorrhage ◽

Structured Data ◽

Data Assessment

Download Full-text

The Jiggle-Viterbi Algorithm for the RFID Reader Using Structured Data-Encoded Waveforms

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e93.a.2108 ◽

2010 ◽

Vol E93-A (11) ◽

pp. 2108-2114

Author(s):

Yung-Yi WANG ◽

Jiunn-Tsair CHEN

Keyword(s):

Viterbi Algorithm ◽

Structured Data ◽

Rfid Reader

Download Full-text

Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.400 ◽

2017 ◽

Vol 7 (10) ◽

pp. 52

Author(s):

LAKSHMI PRANEETHA

Keyword(s):

Real Time ◽

Data Streams ◽

Bloom Filter ◽

Scientific Applications ◽

Pruning Algorithm ◽

Density Data ◽

Data Points ◽

Short Time ◽

Information Streams

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.

Download Full-text

Research and design of a semi-structured data-integration system on multiple Web sources

Advances in Computer Science and Technology ◽

10.2495/iccst140281 ◽

2014 ◽

Author(s):

Q. Yu ◽

Y. N. Wang

Keyword(s):

Data Integration ◽

Structured Data ◽

Integration System ◽

Data Integration System ◽

Research And Design

Download Full-text

Bloom Filter Tree for Fast Search and Synchronization of Tree-Structured Data

Bloom Filter Tree for Fast Search in Tree-Structured Data

DREAM-Yara: An exact read mapper for very large databases with short update time

Fast search algorithm for VQ-based recognition of isolated words

A Model for Structured Data Entry Based on Explicit Descriptional Knowledge

Structured Reporting of Medical Findings: Evaluation of a System in Gastroenterology

Structured Data Entry for Reliable Acquisition of Pharmacokinetic Data

Structured data assessment of hypopituitarism in patients with traumatic brain injury (TBI) or aneurysmal subarachnoid hemorrhage (SAH) – a study concept

The Jiggle-Viterbi Algorithm for the RFID Reader Using Structured Data-Encoded Waveforms

Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams

Research and design of a semi-structured data-integration system on multiple Web sources

Export Citation Format