A Survey: Benchmarking and Performance Modelling of Data Intensive Applications

NAND flash memory has revolutionized how we manage data in modern digital systems, significant improvements are needed in flash-based storage systems to meet the requirements of emerging data-intensive applications. In this paper, we address the problem of NAND aging markers that represent the wearing degree of NAND cells. Since all flash operations are affected by the wearing status of NAND cells, an accurate NAND aging marker is critical to develop flash optimization techniques. From our evaluation study, we first show that the existing P/E cyclebased aging marker (PeWear) is inadequate to estimate the actual aging status of NAND blocks, thus losing opportunities for further optimizations. To overcome the limitations of PeWear, we propose a new NAND aging marker, RealWear, based on extensive characterization studies using real 3D TLC flash chips. By considering multiple variables that can affect the NAND cell wear, RealWear can accurately indicate the actual wear status of NAND blocks during run time. Using three case studies, we demonstrate that RealWear is effective in enhancing the lifetime and performance of a flash storage system. Our experimental results showed that RealWear can extend the lifetime of individual NAND blocks by 63% and can reduce the GC overhead by 21%. Furthermore, RealWear significantly mitigates read latency fluctuations, guaranteeing that the read latency can be bounded with at most 2 read retry operations.

Download Full-text

6G Enabled Smart Infrastructure for Sustainable Society: Opportunities, Challenges, and Research Roadmap

Sensors ◽

10.3390/s21051709 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1709

Author(s):

Agbotiname Lucky Imoize ◽

Oluwadara Adedeji ◽

Nistha Tandiya ◽

Sachin Shetty

Keyword(s):

Wireless Communication ◽

Psychological Health ◽

Future Research ◽

Agriculture Education ◽

Social Psychological ◽

Research Issues ◽

Data Intensive ◽

Wireless Communication Network ◽

Data Intensive Applications

The 5G wireless communication network is currently faced with the challenge of limited data speed exacerbated by the proliferation of billions of data-intensive applications. To address this problem, researchers are developing cutting-edge technologies for the envisioned 6G wireless communication standards to satisfy the escalating wireless services demands. Though some of the candidate technologies in the 5G standards will apply to 6G wireless networks, key disruptive technologies that will guarantee the desired quality of physical experience to achieve ubiquitous wireless connectivity are expected in 6G. This article first provides a foundational background on the evolution of different wireless communication standards to have a proper insight into the vision and requirements of 6G. Second, we provide a panoramic view of the enabling technologies proposed to facilitate 6G and introduce emerging 6G applications such as multi-sensory–extended reality, digital replica, and more. Next, the technology-driven challenges, social, psychological, health and commercialization issues posed to actualizing 6G, and the probable solutions to tackle these challenges are discussed extensively. Additionally, we present new use cases of the 6G technology in agriculture, education, media and entertainment, logistics and transportation, and tourism. Furthermore, we discuss the multi-faceted communication capabilities of 6G that will contribute significantly to global sustainability and how 6G will bring about a dramatic change in the business arena. Finally, we highlight the research trends, open research issues, and key take-away lessons for future research exploration in 6G wireless communication.

Download Full-text

Exploratory Development of Data-intensive Applications

Proceedings of the International Conference on the Art, Science, and Engineering of Programming - Programming '17 ◽

10.1145/3079368.3079399 ◽

2017 ◽

Cited By ~ 1

Author(s):

Patrick Rein ◽

Marcel Taeumel ◽

Robert Hirschfeld ◽

Michael Perscheid

Keyword(s):

Data Intensive ◽

Data Intensive Applications

Download Full-text

EZIOTracer

ACM SIGOPS Operating Systems Review ◽

10.1145/3469379.3469391 ◽

2021 ◽

Vol 55 (1) ◽

pp. 88-98

Author(s):

Mohammed Islam Naas ◽

François Trahay ◽

Alexis Colin ◽

Pierre Olivier ◽

Stéphane Rubini ◽

...

Keyword(s):

Analysis Framework ◽

Comprehensive Understanding ◽

Kernel Space ◽

Data Intensive ◽

Storage Performance ◽

Performance Requirements ◽

Memory Footprint ◽

Extreme Performance ◽

Data Intensive Applications ◽

Kernel Level

Tracing is a popular method for evaluating, investigating, and modeling the performance of today's storage systems. Tracing has become crucial with the increase in complexity of modern storage applications/systems, that are manipulating an ever-increasing amount of data and are subject to extreme performance requirements. There exists many tracing tools focusing either on the user-level or the kernel-level, however we observe the lack of a unified tracer targeting both levels: this prevents a comprehensive understanding of modern applications' storage performance profiles. In this paper, we present EZIOTracer, a unified I/O tracer for both (Linux) kernel and user spaces, targeting data intensive applications. EZIOTracer is composed of a userland as well as a kernel space tracer, complemented with a trace analysis framework able to merge the output of the two tracers, and in particular to relate user-level events to kernel-level ones, and vice-versa. On the kernel side, EZIOTracer relies on eBPF to offer safe, low-overhead, low memory footprint, and flexible tracing capabilities. We demonstrate using FIO benchmark the ability of EZIOTracer to track down I/O performance issues by relating events recorded at both the kernel and user levels. We show that this can be achieved with a relatively low overhead that ranges from 2% to 26% depending on the I/O intensity.

Download Full-text

Domain Metric Driven Decomposition of Data-Intensive Applications

2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) ◽

10.1109/issrew51248.2020.00071 ◽

2020 ◽

Author(s):

Matteo Camilli ◽

Carmine Colarusso ◽

Barbara Russo ◽

Eugenio Zimeo

Keyword(s):

Data Intensive ◽

Data Intensive Applications

Download Full-text

Energy Efficient Storage Management Cooperated with Large Data Intensive Applications

2012 IEEE 28th International Conference on Data Engineering ◽

10.1109/icde.2012.47 ◽

2012 ◽

Cited By ~ 7

Author(s):

Norifumi Nishikawa ◽

Miyuki Nakano ◽

Masaru Kitsuregawa

Keyword(s):

Energy Efficient ◽

Large Data ◽

Storage Management ◽

Data Intensive ◽

Efficient Storage ◽

Data Intensive Applications

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

Custom templates based heterogeneous resource allocation for data-intensive applications

10.32469/10355/86482 ◽

2020 ◽

Author(s):

◽

Ronny Bazan Antequera

Keyword(s):

High Performance ◽

Real Data ◽

University Of Missouri ◽

Application Performance ◽

Data Intensive ◽

Edge Based ◽

The Right ◽

Heterogeneous Cloud ◽

Data Intensive Applications ◽

Cloud Resources

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI-COLUMBIA AT REQUEST OF AUTHOR.] The increase of data-intensive applications in science and engineering fields (i.e., bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. Moreover, using remote public resources presents constraints at the private edge network domain. Specifically, mis-configured network policies cause bottlenecks due to the other application cross-traffic attempting to use shared networking resources. Additionally, selecting the right remote resources can be cumbersome especially for those users who are interested in the application execution considering nonfunctional requirements such as performance, security and cost. The data-intensive applications have recurrent deployments and similar infrastructure requirements that can be addressed by creating templates. In this thesis, we handle applications requirements through intelligent resource 'abstractions' coupled with 'reusable' approaches that save time and effort in deploying new cloud architectures. Specifically, we design a novel custom template middleware that can retrieve blue prints of resource configuration, technical/policy information, and benchmarks of workflow performance to facilitate repeatable/reusable resource composition. The middleware considers hybrid-recommendation methodology (Online and offline recommendation) to leverage a catalog to rapidly check custom template solution correctness before/during resource consumption. Further, it prescribes application adaptations by fostering effective social interactions during the application's scaling stages. Based on the above approach, we organize the thesis contributions under two main thrusts: (i) Custom Templates for Cloud Networking for Data-intensive Applications: This involves scheduling transit selection, engineering at the campus-edge based upon real-time policy control. Our solution ensures prioritized application performance delivery for multi-tenant traffic profiles from a diverse set of actual data intensive applications in bioinformatics. (ii) Custom Templates for Cloud Computing for Data-intensive Applications: This involves recommending cloud resources for data-intensive applications based on a custom template catalog. We develop a novel expert system approach that is implemented as a middleware to abstracts data-intensive application requirements for custom templates composition. We uniquely consider heterogeneous cloud resources selection for the deployment of cloud architectures for real data-intensive applications in cybermanufacturing.

Download Full-text

NSM: a distributed storage architecture for data-intensive applications

20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. ◽

10.1109/mass.2003.1194842 ◽

2003 ◽

Cited By ~ 4

Author(s):

Z. Ali ◽

Q. Malluhi

Keyword(s):

Distributed Storage ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

A Survey: Benchmarking and Performance Modelling of Data Intensive Applications

Protocols for wide-area data-intensive applications: Design and performance issues

RealWear

6G Enabled Smart Infrastructure for Sustainable Society: Opportunities, Challenges, and Research Roadmap

Exploratory Development of Data-intensive Applications

EZIOTracer

Domain Metric Driven Decomposition of Data-Intensive Applications

Energy Efficient Storage Management Cooperated with Large Data Intensive Applications

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Custom templates based heterogeneous resource allocation for data-intensive applications

NSM: a distributed storage architecture for data-intensive applications

Export Citation Format