main memory
Recently Published Documents


TOTAL DOCUMENTS

990
(FIVE YEARS 198)

H-INDEX

50
(FIVE YEARS 4)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


2022 ◽  
Vol 27 (2) ◽  
pp. 223-234
Author(s):  
Jinbao Wang ◽  
Zhuojun Duan ◽  
Xixian Han ◽  
Donghua Yang

2022 ◽  
Vol 21 (1) ◽  
pp. 1-22
Author(s):  
Dongsuk Shin ◽  
Hakbeom Jang ◽  
Kiseok Oh ◽  
Jae W. Lee

A long battery life is a first-class design objective for mobile devices, and main memory accounts for a major portion of total energy consumption. Moreover, the energy consumption from memory is expected to increase further with ever-growing demands for bandwidth and capacity. A hybrid memory system with both DRAM and PCM can be an attractive solution to provide additional capacity and reduce standby energy. Although providing much greater density than DRAM, PCM has longer access latency and limited write endurance to make it challenging to architect it for main memory. To address this challenge, this article introduces CAMP, a novel DRAM c ache a rchitecture for m obile platforms with P CM-based main memory. A DRAM cache in this environment is required to filter most of the writes to PCM to increase its lifetime, and deliver highest efficiency even for a relatively small-sized DRAM cache that mobile platforms can afford. To address this CAMP divides DRAM space into two regions: a page cache for exploiting spatial locality in a bandwidth-efficient manner and a dirty block buffer for maximally filtering writes. CAMP improves the performance and energy-delay-product by 29.2% and 45.2%, respectively, over the baseline PCM-oblivious DRAM cache, while increasing PCM lifetime by 2.7×. And CAMP also improves the performance and energy-delay-product by 29.3% and 41.5%, respectively, over the state-of-the-art design with dirty block buffer, while increasing PCM lifetime by 2.5×.


2022 ◽  
Vol 21 (1) ◽  
pp. 1-25
Author(s):  
Kazi Asifuzzaman ◽  
Rommel Sánchez Verdejo ◽  
Petar Radojković

It is questionable whether DRAM will continue to scale and will meet the needs of next-generation systems. Therefore, significant effort is invested in research and development of novel memory technologies. One of the candidates for next-generation memory is Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM). STT-MRAM is an emerging non-volatile memory with a lot of potential that could be exploited for various requirements of different computing systems. Being a novel technology, STT-MRAM devices are already approaching DRAM in terms of capacity, frequency, and device size. Although STT-MRAM technology got significant attention of various major memory manufacturers, academic research of STT-MRAM main memory remains marginal. This is mainly due to the unavailability of publicly available detailed timing and current parameters of this novel technology, which are required to perform a reliable main memory simulation on performance and power estimation. This study demonstrates an approach to perform a cycle accurate simulation of STT-MRAM main memory, being the first to release detailed timing and current parameters of this technology from academia—essentially enabling researchers to conduct reliable system-level simulation of STT-MRAM using widely accepted existing simulation infrastructure. The results show a fairly narrow overall performance deviation in response to significant variations in key timing parameters, and the power consumption experiments identify the key power component that is mostly affected with STT-MRAM.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 240
Author(s):  
Beomjun Kim ◽  
Yongtae Kim ◽  
Prashant Nair ◽  
Seokin Hong

STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based on-chip caches. Due to its high density and low leakage power, STT-RAM can be used to build massive capacity last-level caches (LLC). Unfortunately, STT-RAM has a much longer write latency and a much greater write energy than SRAM. Researchers developed hybrid caches made up of SRAM and STT-RAM regions to cope with these challenges. In order to store as many write-intensive blocks in the SRAM region as possible in hybrid caches, an intelligent block placement policy is essential. This paper proposes an adaptive block placement framework for hybrid caches that incorporates metadata embedding (ADAM). When a cache block is evicted from the LLC, ADAM embeds metadata (i.e., write intensity) into the block. Metadata embedded in the cache block are then extracted and used to determine the block’s write intensity when it is fetched from main memory. Our research demonstrates that ADAM can enhance performance by 26% (on average) when compared to a baseline block placement scheme.


2022 ◽  
Author(s):  
Yihe Huang ◽  
William Qian ◽  
Eddie Kohler ◽  
Barbara Liskov ◽  
Liuba Shrira
Keyword(s):  

2022 ◽  
pp. 104450
Author(s):  
Peng Liu ◽  
Xiaojun Cai ◽  
Zhaoyan Shen ◽  
Mengying Zhao ◽  
Zhiping Jia
Keyword(s):  

Author(s):  
Emmanuel Agullo ◽  
Mirco Altenbernd ◽  
Hartwig Anzt ◽  
Leonardo Bautista-Gomez ◽  
Tommaso Benacchio ◽  
...  

This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.


Author(s):  
Gajanan Digambar Gaikwad

Abstract: Operating system offers a service known as memory management which manages and guides primary memory. It moves processes between disk and main memory during the execution back and forth. The process in which we provisionally moves process from primary memory to the hard disk so the memory is available for other processes. This process is known as swapping. Page replacement techniques are the methods by which the operating system concludes which memory pages to be swapped out and write to disk, whenever a page of main memory is required to be allocated. There are different policies regarding how to select a page to be swapped out when a page fault occurs to create space for new page. These Policies are called page replacement algorithms. In this paper the strategy for identifying the refresh rate for ‘Aging’ page replacement algorithm is presented and evaluated. Keywords: Aging algorithm, page replacement algorithm, refresh rate, virtual memory management.


2021 ◽  
Vol 13 (21) ◽  
pp. 4437
Author(s):  
Anh Vu Vo ◽  
Debra F. Laefer ◽  
Jonathan Byrne

This paper introduces a genetic algorithm (GA) and a beam tracing algorithm incorporated within a dual parallel computing framework to optimize urban aerial laser scanning (ALS) missions to maximize vertical façade data capture, as needed for many three-dimensional reconstruction and modeling workflows. The optimization employs a low-density point cloud from the site of interest as a spatial representation of the urban scene. The GA is suitable for LiDAR flight path optimization due to its capability of handling open-ended problems that have many solutions. However, GAs require evaluating a very large number of candidates. The use of an initial point cloud allows realistic modeling of the urban environment in the optimization at the cost of high data input volumes. To cope with the computational and data demands, a dual parallel computing framework was devised. The parallel computing framework consists of two layers of parallelization. In the upper layer, multiple evaluators work in parallel and in conjunction with a main multi-threading GA optimizer to perform GA operations and evaluate the flight paths. In the lower layer, to evaluate assigned flight paths, each evaluator distributes its data and computation to multiple executors, which can reside on multiple physical nodes of a distributed-memory computing cluster. In addition to parallelism, the data partitioning on the lower layer allows out-of-core computation. Namely, data partitions are efficiently transferred between disks and memory so that only relevant subsets of data are kept in the main memory. The objective of the proposed method is threefold: (1) search for flight paths that yield the highest numbers of vertical points, (2) create a means to explicitly consider the detailed spatial configuration of urban environments, and (3) assure that the proposed optimization strategy is fast and can scale to large problem sizes. Multiple experiments were conducted and demonstrated the success of the proposed method. Converged results were achieved after dozens of generations within two hours. Two flight paths identified by the GA as the most and the least optimal candidates were deployed in real flight missions. The optimal flight path captured 16% more vertical points than the least optimal one, slightly higher than the 13% predicted. Both layers of parallelization were efficient: 13.1/16 for the lower layer and 3.2/4 for the upper layer. The two complementary layers of parallelization allowed flexible and efficient use of distributed computing resources to reduce the runtime. The scalability of the proposed approach was successfully demonstrated up to a data size of 460 million points. The optimization results were realistic and aligned well with the test flight results.


Sign in / Sign up

Export Citation Format

Share Document