scholarly journals Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries

2021 ◽  
pp. 101791
Author(s):  
Martin Schäler ◽  
Christine Tex ◽  
Veit Köppen ◽  
David Broneske ◽  
Gunter Saake
2020 ◽  
Vol 10 (3) ◽  
pp. 999
Author(s):  
Hyokyung Bahn ◽  
Kyungwoon Cho

Recently, non-volatile memory (NVM) has advanced as a fast storage medium, and legacy memory subsystems optimized for DRAM (dynamic random access memory) and HDD (hard disk drive) hierarchies need to be revisited. In this article, we explore the memory subsystems that use NVM as an underlying storage device and discuss the challenges and implications of such systems. As storage performance becomes close to DRAM performance, existing memory configurations and I/O (input/output) mechanisms should be reassessed. This article explores the performance of systems with NVM based storage emulated by the RAMDisk under various configurations. Through our measurement study, we make the following findings. (1) We can decrease the main memory size without performance penalties when NVM storage is adopted instead of HDD. (2) For buffer caching to be effective, judicious management techniques like admission control are necessary. (3) Prefetching is not effective in NVM storage. (4) The effect of synchronous I/O and direct I/O in NVM storage is less significant than that in HDD storage. (5) Performance degradation due to the contention of multi-threads is less severe in NVM based storage than in HDD. Based on these observations, we discuss a new PC configuration consisting of small memory and fast storage in comparison with a traditional PC consisting of large memory and slow storage. We show that this new memory-storage configuration can be an alternative solution for ever-growing memory demands and the limited density of DRAM memory. We anticipate that our results will provide directions in system software development in the presence of ever-faster storage devices.


2002 ◽  
Vol 2 (1) ◽  
pp. 36-47 ◽  
Author(s):  
Philip L. Bohannon ◽  
Rajeev R. Rastogi ◽  
Avi Silberschatz ◽  
S. Sudarshan
Keyword(s):  

Paleobiology ◽  
1987 ◽  
Vol 13 (3) ◽  
pp. 272-285 ◽  
Author(s):  
Jennifer A. Kitchell ◽  
George Estabrook ◽  
Norman MacLeod

A new method of data analysis offers a potentially powerful tool for statistically evaluating hypotheses of rate in temporally-ordered evolutionary phenomena. We present a method for bootstrapping time-ordered data sets to test hypotheses of the equality of rate. This method is applicable to both nonrandom and random generative processes. The method is applied to the data of Malmgren et al. (1983) for the Globorotalia plesiotumida–G. tumida planktonic foraminiferan lineage and the data of Reyment (1982) for the benthonic foraminiferan Afrobolivina afar. G. plesiotumida is recognizable on the basis of independent data as a species distinct from G. tumida, its descendant. Evolutionary change rate during the evolution of G. tumida from G. plesiotumida is shown to be faster than rates within either species. The pattern of variation exhibited by A. afar includes a time interval of more rapid change; this more rapid change is observed post hoc. A bootstrapping model based on post hoc observations reveals the rate in this time interval to be not significantly faster than expected in such post hoc intervals.


2005 ◽  
Vol 182 (1) ◽  
pp. 11-24 ◽  
Author(s):  
Radhouan Ben-Hamadou ◽  
Ibanez Frédéric ◽  
Picheral Marc ◽  
Gorsky Gabriel

2021 ◽  
Vol 22 (2) ◽  
pp. 119-134
Author(s):  
Ahad Shamseen ◽  
Morteza Mohammadi Zanjireh ◽  
Mahdi Bahaghighat ◽  
Qin Xin

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.


2007 ◽  
Vol 3 (1) ◽  
pp. 19-23
Author(s):  
Seok-Jae Lee ◽  
Jong-Hyun Yoon ◽  
Seok-Il Song ◽  
Jae-Soo Yoo

2020 ◽  
Vol 10 (7) ◽  
pp. 2539 ◽  
Author(s):  
Toan Nguyen Mau ◽  
Yasushi Inoguchi

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.


Author(s):  
Philip Bohannon ◽  
Daniel Lieuwen ◽  
Rajeev Rastogi ◽  
Avi Silberschatz ◽  
S. Seshadri ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document