data access patterns Latest Research Papers

We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage, compression, and query processing techniques based on these desiderata. In addition to showing direct integration of existing techniques from columnar RDBMSs, we also propose novel ones that are optimized for GDBMSs. These include a novel list-based query processor, which avoids expensive data copies of traditional block-based processors under many-to-many joins, a new data structure we call single-indexed edge property pages and an accompanying edge ID scheme, and a new application of Jacobson's bit vector index for compressing NULL values and empty lists. We integrated our techniques into the GraphflowDB in-memory GDBMS. Through extensive experiments, we demonstrate the scalability and query performance benefits of our techniques.

Download Full-text

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467877 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1900-1912

Author(s):

Yingqiang Zhang ◽

Chaoyi Ruan ◽

Cheng Li ◽

Xinjun Yang ◽

Wei Cao ◽

...

Keyword(s):

Memory Management ◽

High Speed ◽

Relational Databases ◽

Large Fraction ◽

Cost Effective ◽

Data Access ◽

Remote Memory ◽

Recent Emergence ◽

Data Access Patterns ◽

Access Patterns

It is challenging for cloud-native relational databases to meet the ever-increasing needs of scaling compute and memory resources independently and elastically. The recent emergence of memory disaggregation architecture, relying on high-speed RDMA network, offers opportunities to build cost-effective and elastic cloud-native databases. There exist proposals to let unmodified applications run transparently on disaggregated systems. However, running relational database kernel atop such proposals experiences notable performance degradation and time-consuming failure recovery, offsetting the benefits of disaggregation. To address these challenges, in this paper, we propose a novel database architecture called LegoBase, which explores the co-design of database kernel and memory disaggregation. It pushes the memory management back to the database layer for bypassing the Linux I/O stack and re-using or designing (remote) memory access optimizations with an understanding of data access patterns. LegoBase further splits the conventional ARIES fault tolerance protocol to independently handle the local and remote memory failures for fast recovery of compute instances. We implemented LegoBase atop MySQL. We compare LegoBase against MySQL running on a standalone machine and the state-of-the-art disaggregation proposal Infiniswap. Our evaluation shows that even with a large fraction of data placed on the remote memory, LegoBase's system performance in terms of throughput (up to 9.41% drop) and P99 latency (up to 11.58% increase) is comparable to the monolithic MySQL setup, and significantly outperforms (1.99x-2.33x, respectively) the deployment of MySQL over Infiniswap. Meanwhile, LegoBase introduces an up to 3.87x and 5.48x speedup of the recovery and warm-up time, respectively, over the monolithic MySQL and MySQL over Infiniswap, when handling failures or planned re-configurations.

Download Full-text

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3061895 ◽

2021 ◽

pp. 1-1

Author(s):

Gregory Herschlag ◽

Seyong Lee ◽

Jeffrey Vetter ◽

Amanda Randles

Keyword(s):

Lattice Boltzmann ◽

Data Access ◽

Complex Geometries ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Graphical Presentations of Clinical Data in a Learning Electronic Medical Record

Applied Clinical Informatics ◽

10.1055/s-0040-1709707 ◽

2020 ◽

Vol 11 (04) ◽

pp. 680-691

Author(s):

Luca Calzoni ◽

Gilles Clermont ◽

Gregory F. Cooper ◽

Shyam Visweswaran ◽

Harry Hochheiser

Keyword(s):

User Interface ◽

Cognitive Abilities ◽

Participatory Design ◽

Relevant Literature ◽

Data Access ◽

Relevant Information ◽

Color Coding ◽

Data Access Patterns ◽

Access Patterns ◽

Icu Physicians

Abstract Background Complex electronic medical records (EMRs) presenting large amounts of data create risks of cognitive overload. We are designing a Learning EMR (LEMR) system that utilizes models of intensive care unit (ICU) physicians' data access patterns to identify and then highlight the most relevant data for each patient. Objectives We used insights from literature and feedback from potential users to inform the design of an EMR display capable of highlighting relevant information. Methods We used a review of relevant literature to guide the design of preliminary paper prototypes of the LEMR user interface. We observed five ICU physicians using their current EMR systems in preparation for morning rounds. Participants were interviewed and asked to explain their interactions and challenges with the EMR systems. Findings informed the revision of our prototypes. Finally, we conducted a focus group with five ICU physicians to elicit feedback on our designs and to generate ideas for our final prototypes using participatory design methods. Results Participating physicians expressed support for the LEMR system. Identified design requirements included the display of data essential for every patient together with diagnosis-specific data and new or significantly changed information. Respondents expressed preferences for fishbones to organize labs, mouseovers to access additional details, and unobtrusive alerts minimizing color-coding. To address the concern about possible physician overreliance on highlighting, participants suggested that non-highlighted data should remain accessible. Study findings led to revised prototypes, which will inform the development of a functional user interface. Conclusion In the feedback we received, physicians supported pursuing the concept of a LEMR system. By introducing novel ways to support physicians' cognitive abilities, such a system has the potential to enhance physician EMR use and lead to better patient outcomes. Future plans include laboratory studies of both the utility of the proposed designs on decision-making, and the possible impact of any automation bias.

Download Full-text

DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives

International Journal of Digital Earth ◽

10.1080/17538947.2020.1778804 ◽

2020 ◽

Vol 13 (12) ◽

pp. 1656-1671 ◽

Cited By ~ 1

Author(s):

Jizhe Xia ◽

Sicheng Huang ◽

Shaobiao Zhang ◽

Xiaoming Li ◽

Jianrong Lyu ◽

...

Keyword(s):

Spatial Data ◽

Data Access ◽

Indexing Scheme ◽

Data Indexing ◽

Digital Earth ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps47924.2020.00106 ◽

2020 ◽

Author(s):

Peter Pirkelbauer ◽

Pei-Hung Lin ◽

Tristan Vanderbruggen ◽

Chunhua Liao

Keyword(s):

Data Access ◽

Automatic Analysis ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Geant Exascale Pilot Project

EPJ Web of Conferences ◽

10.1051/epjconf/202024509015 ◽

2020 ◽

Vol 245 ◽

pp. 09015

Author(s):

Philippe Canal ◽

Elizabeth Sexton-Kennedy ◽

Jonathan Madsen ◽

Soon Yung Jun ◽

Guilherme Lima ◽

...

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Data Access ◽

Pilot Project ◽

Software Framework ◽

Single Node ◽

Computing Power ◽

Simulation Problem ◽

Data Access Patterns ◽

Access Patterns

The upcoming generation of exascale HPC machines will all have most of their computing power provided by GPGPU accelerators. In order to be able to take advantage of this class of machines for HEP Monte Carlo simulations, we started to develop a Geant pilot application as a collaboration between HEP and the Exascale Computing Project. We will use this pilot to study and characterize how the machines’ architecture affects performance. The pilot will encapsulate the minimum set of physics and software framework processes necessary to describe a representative HEP simulation problem. The pilot will then be used to exercise communication, computation, and data access patterns. The project’s main objective is to identify re-engineering opportunities that will increase event throughput by improving single node performance and being able to make efficient use of the next generation of accelerators available in Exascale facilities.

Download Full-text

Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

EPJ Web of Conferences ◽

10.1051/epjconf/202024503005 ◽

2020 ◽

Vol 245 ◽

pp. 03005

Author(s):

Pascal Paschos ◽

Benedikt Riedel ◽

Mats Rynge ◽

Lincoln Bryant ◽

Judith Stephen ◽

...

Keyword(s):

Distributed Computing ◽

Large Scale ◽

Data Access ◽

Open Science ◽

Distributed Resources ◽

Support Teams ◽

Institutional Researchers ◽

Data Access Patterns ◽

Access Patterns ◽

Open Science Grid

In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.

Download Full-text

Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality

Proceedings of the 20th International Middleware Conference Industrial Track ◽

10.1145/3366626.3368125 ◽

2019 ◽

Author(s):

SeongJae Park ◽

Yunjae Lee ◽

Heon Y. Yeom

Keyword(s):

Data Access ◽

Dynamic Data ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

REVIEWING DATA ACCESS PATTERNS AND COMPUTATIONAL REDUNDANCY FOR MACHINE LEARNING ALGORITHMS

Proceedings of the International Conferences Big Data Analytics, Data Mining and Computational Intelligence 2019; and Theory and Practice in Modern Computing 2019 ◽

10.33965/bigdaci2019_201907l004 ◽

2019 ◽

Author(s):

Imen Chakroun ◽

Tom Vander Aa ◽

Tom Ashby

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Data Access ◽

Machine Learning Algorithms ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

data access patterns
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Columnar storage and list-based processing for graph database management systems

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

Graphical Presentations of Clinical Data in a Learning Electronic Medical Record

DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives

XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems

Geant Exascale Pilot Project

Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality

REVIEWING DATA ACCESS PATTERNS AND COMPUTATIONAL REDUNDANCY FOR MACHINE LEARNING ALGORITHMS

Export Citation Format

data access patternsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Columnar storage and list-based processing for graph database management systems

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

Graphical Presentations of Clinical Data in a Learning Electronic Medical Record

DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives

XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems

Geant Exascale Pilot Project

Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality

REVIEWING DATA ACCESS PATTERNS AND COMPUTATIONAL REDUNDANCY FOR MACHINE LEARNING ALGORITHMS

data access patterns
Recently Published Documents