Kreon

Persistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Nowadays, due to power limitations, it is important to reduce CPU overheads for data processing. In this article, we propose Kreon , a key-value store that targets servers with flash-based storage, where CPU overhead and I/O amplification are more significant bottlenecks compared to I/O randomness. We first observe that two significant sources of overhead in key-value stores are: (a) The use of compaction in Log-Structured Merge-Trees (LSM-Tree) that constantly perform merging and sorting of large data segments and (b) the use of an I/O cache to access devices, which incurs overhead even for data that reside in memory. To avoid these, Kreon performs data movement from level to level by using partial reorganization instead of full data reorganization via the use of a full index per-level. Kreon uses memory-mapped I/O via a custom kernel path to avoid a user-space cache. For a large dataset, Kreon reduces CPU cycles/op by up to 5.8×, reduces I/O amplification for inserts by up to 4.61×, and increases insert ops/s by up to 5.3×, compared to RocksDB.

Download Full-text

Ten simple rules for managing high-throughput nucleotide sequencing data

10.1101/049338 ◽

2016 ◽

Cited By ~ 1

Author(s):

Rutger A. Vos

Keyword(s):

Data Management ◽

High Throughput ◽

Large Data ◽

Data Access ◽

Nucleotide Sequencing ◽

Data Generation ◽

Sequencing Data ◽

Data Movement ◽

Simple Rules ◽

Bandwidth Consumption

AbstractThe challenges posed by large data volumes produced by high-throughput nucleotide sequencing technologies are well known. This document establishes ten simple rules for coping with these challenges. At the level of master data management, (1) data triage reduces data volumes; (2) some lossless data representations are much more compact than others; (3) careful management of data replication reduces wasted storage space. At the level of data analysis, (4) automated analysis pipelines obviate the need for storing work files; (5) virtualization reduces the need for data movement and bandwidth consumption; (6) tracking of data and analysis provenance will generate a paper trail to better understand how results were produced. At the level of data access and sharing, (7) careful modeling of data movement patterns reduces bandwidth consumption and haphazard copying; (8) persistent, resolvable identifiers for data reduce ambiguity caused by data movement; (9) sufficient metadata enables more effective collaboration. Finally, because of rapid developments in HTS technologies, (10) agile practices that combine loosely coupled modules operating on standards-compliant data are the best approach for avoiding lock-in. A generalized scenario is presented for data management from initial raw data generation to publication of result data.

Download Full-text

Quantum-Dot Transistor Based Multi-Bit Multiplier Unit for In-Memory Computing

International Journal of High Speed Electronics and Systems ◽

10.1142/s0129156420400078 ◽

2020 ◽

Vol 29 (01n04) ◽

pp. 2040007

Author(s):

Yang Zhao ◽

Fengyu Qian ◽

Faquir Jain ◽

Lei Wang

Keyword(s):

Quantum Dot ◽

Data Processing ◽

High Performance ◽

Memory Cell ◽

Process Variations ◽

High Energy ◽

Data Movement ◽

Neuron Networks ◽

Fast Processing ◽

Dot Product

In-memory computing is an emerging technique to fulfill the fast growing demand for high-performance data processing. This technique provides fast processing and high throughput by accessing data stored in the memory array rather than dealing with complicated operation and data movement on hard drive. For data processing, the most important computation is dot product, which is also the core computation for applications such as deep learning neuron networks, machine learning, etc. As multiplication is the key function in dot product, it is critical to improve its performance and achieve faster memory processing. In this paper, we present a design with the ability to perform in-memory multi-bit multiplications. The proposed design is implemented by using quantum-dot transistors, which enable multi-bit computations in the memory cell. Experimental results demonstrate that the proposed design provides reliable in-memory multi-bit multiplications with high density and high energy efficiency. Statistical analysis is performed using Monte Carlo simulations to investigate the process variations and error effects.

Download Full-text

BIG DATA PROCESSING IN THE DIGITALIZATION OF ENTERPRISE ACTIVITIES

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.09 ◽

2021 ◽

Vol 75 (3) ◽

pp. 76-82

Author(s):

G.T. Balakayeva ◽

◽

D.K. Darkenbayev ◽

M. Turdaliyev ◽

◽

...

Keyword(s):

Social Networks ◽

Growth Rate ◽

Big Data ◽

Data Processing ◽

New Technologies ◽

Large Data ◽

Modern Society ◽

Information Flows ◽

Professional Activities ◽

The Past

The growth rate of these enterprises has increased significantly in the last decade. Research has shown that over the past two decades, the amount of data has increased approximately tenfold every two years - this exceeded Moore's Law, which doubles the power of processors. About thirty thousand gigabytes of data are accumulated every second, and their processing requires an increase in the efficiency of data processing. Uploading videos, photos and letters from users on social networks leads to the accumulation of a large amount of data, including unstructured ones. This leads to the need for enterprises to work with big data of different formats, which must be prepared in a certain way for further work in order to obtain the results of modeling and calculations. In connection with the above, the research carried out in the article on processing and storing large data of an enterprise, developing a model and algorithms, as well as using new technologies is relevant. Undoubtedly, every year the information flows of enterprises will increase and in this regard, it is important to solve the issues of storing and processing large amounts of data. The relevance of the article is due to the growing digitalization, the increasing transition to professional activities online in many areas of modern society. The article provides a detailed analysis and research of these new technologies.

Download Full-text

Software-defined data protection

Proceedings of the VLDB Endowment ◽

10.14778/3450980.3450986 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1167-1174

Author(s):

Zsolt István ◽

Soujanya Ponnapalli ◽

Vijay Chidambaram

Keyword(s):

Data Processing ◽

Data Protection ◽

Distributed Storage ◽

Hardware Design ◽

Large Data ◽

Hardware Interface ◽

Future Data ◽

Specialized Hardware

Most modern data processing pipelines run on top of a distributed storage layer, and securing the whole system, and the storage layer in particular, against accidental or malicious misuse is crucial to ensuring compliance to rules and regulations. Enforcing data protection and privacy rules, however, stands at odds with the requirement to achieve higher and higher access bandwidths and processing rates in large data processing pipelines. In this work we describe our proposal for the path forward that reconciles the two goals. We call our approach "Software-Defined Data Protection" (SDP). Its premise is simple, yet powerful: decoupling often changing policies from request-level enforcement allows distributed smart storage nodes to implement the latter at line-rate. Existing and future data protection frameworks can be translated to the same hardware interface which allows storage nodes to offload enforcement efficiently both for company-specific rules and regulations, such as GDPR or CCPA. While SDP is a promising approach, there are several remaining challenges to making this vision reality. As we explain in the paper, overcoming these will require collaboration across several domains, including security, databases and specialized hardware design.

Download Full-text

A Cognitive Research Tendency in Data Management of Sensor Network

International Journal of Wireless and Ad Hoc Communication ◽

10.54216/ijwac.030103 ◽

2021 ◽

pp. 26-36

Author(s):

Subhra Prosun Paul ◽

◽

Dr. Shruti Aggarwal ◽

Keyword(s):

Sensor Networks ◽

Data Management ◽

Sensor Network ◽

High Speed ◽

Web Of Science ◽

Data Access ◽

Global Perspective ◽

Data Movement ◽

Network Database ◽

Cognitive Research

In today’s World sensor networks offer various opportunities for data management applications because of their low cost, reliability, scalability, high-speed data processing, and other versatile advantageous purposes. It is a great challenge to organize data effectively and to retrieve the appropriate data from the large volume of various data sets in ad-hoc network databases, mobile databases, etc. The sensor network is necessary for routing of data, performance analysis of data management activities, and data incorporation for the right application. Data management involves intranet and extranet query handling, data access mechanism, modeling of data, different data movement algorithm, data warehousing, and data mining of network database. Additionally, connectivity, design, and lifetime are important issues for sensor networks to perform all data management activities smoothly. In this paper, we are trying to give a cognitive research tendency of Sensor network data management in the last two decades considering all the challenges and issues of both sensor network database and data management functions using Scopus and Web of Science database. To analyze data, different assessments are done considering various parameters like the author, time, publication and citation number, place, source, document separately for Web of Science and Scopus database in global perspective. It is noticed that there is a significant growth of research in data management for sensor networks because of the popularity of this topic.

Download Full-text

Multi-Variable, High Order, Performance Models (2005C)

Fluids Engineering ◽

10.1115/imece2005-79416 ◽

2005 ◽

Cited By ~ 3

Author(s):

David Japikse ◽

Oleg Dubitsky ◽

Kerry N. Oliphant ◽

Robert J. Pelton ◽

Daniel Maynes ◽

...

Keyword(s):

Data Processing ◽

Large Data ◽

High Order ◽

Large Data Sets ◽

Data Sets ◽

Performance Models ◽

Statistical Accuracy ◽

Evaluation Methodologies ◽

New Models ◽

The Impact

In the course of developing advanced data processing and advanced performance models, as presented in companion papers, a number of basic scientific and mathematical questions arose. This paper deals with questions such as uniqueness, convergence, statistical accuracy, training, and evaluation methodologies. The process of bringing together large data sets and utilizing them, with outside data supplementation, is considered in detail. After these questions are focused carefully, emphasis is placed on how the new models, based on highly refined data processing, can best be used in the design world. The impact of this work on designs of the future is discussed. It is expected that this methodology will assist designers to move beyond contemporary design practices.

Download Full-text

Design and Analysis of Large Data Processing Techniques

International Journal of Computer Applications ◽

10.5120/17546-8139 ◽

2014 ◽

Vol 100 (8) ◽

pp. 24-28 ◽

Cited By ~ 1

Author(s):

Madhavi Vaidya ◽

Shrinivas Deshpande ◽

Vilas Thakare

Keyword(s):

Data Processing ◽

Large Data ◽

Processing Techniques

Download Full-text

DEVELOPMENT OF A HETEROGENIC DISTRIBUTED ENVIRONMENT FOR SPATIAL DATA PROCESSING USING CLOUD TECHNOLOGIES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b4-385-2016 ◽

2016 ◽

Vol XLI-B4 ◽

pp. 385-390

Author(s):

A. S. Garov ◽

I. P. Karachevtseva ◽

E. V. Matveev ◽

A. E. Zubarev ◽

I. V. Florinsky

Keyword(s):

Data Processing ◽

Spatial Data ◽

3D Visualization ◽

New Technology ◽

Main Idea ◽

Remote Sensing Data ◽

Data Access ◽

Working Environment ◽

Software Environment ◽

Spatial Data Processing

We are developing a unified distributed communication environment for processing of spatial data which integrates web-, desktop- and mobile platforms and combines volunteer computing model and public cloud possibilities. The main idea is to create a flexible working environment for research groups, which may be scaled according to required data volume and computing power, while keeping infrastructure costs at minimum. It is based upon the "single window" principle, which combines data access via geoportal functionality, processing possibilities and communication between researchers. Using an innovative software environment the recently developed planetary information system (<a href="http://cartsrv.mexlab.ru/geoportal"target="_blank">http://cartsrv.mexlab.ru/geoportal</a>) will be updated. The new system will provide spatial data processing, analysis and 3D-visualization and will be tested based on freely available Earth remote sensing data as well as Solar system planetary images from various missions. Based on this approach it will be possible to organize the research and representation of results on a new technology level, which provides more possibilities for immediate and direct reuse of research materials, including data, algorithms, methodology, and components. The new software environment is targeted at remote scientific teams, and will provide access to existing spatial distributed information for which we suggest implementation of a user interface as an advanced front-end, e.g., for virtual globe system.

Download Full-text