data movement
Recently Published Documents





2022 ◽  
Vol 15 (3) ◽  
pp. 1-32
Nikolaos Alachiotis ◽  
Panagiotis Skrimponis ◽  
Manolis Pissadakis ◽  
Dionisios Pnevmatikatos

Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.

2022 ◽  
Vol 27 (2) ◽  
pp. 1-30
Jaechul Lee ◽  
Cédric Killian ◽  
Sebastien Le Beux ◽  
Daniel Chillet

The energy consumption of manycore architectures is dominated by data movement, which calls for energy-efficient and high-bandwidth interconnects. To overcome the bandwidth limitation of electrical interconnects, integrated optics appear as a promising technology. However, it suffers from high power overhead related to low laser efficiency, which calls for the use of techniques and methods to improve its energy costs. Besides, approximate computing is emerging as an efficient method to reduce energy consumption and improve execution speed of embedded computing systems. It relies on allowing accuracy reduction on data at the cost of tolerable application output error. In this context, the work presented in this article exploits both features by defining approximate communications for error-tolerant applications. We propose a method to design realistic and scalable nanophotonic interconnect supporting approximate data transmission and power adaption according to the communication distance to improve the energy efficiency. For this purpose, the data can be sent by mixing low optical power signal and truncation for the Least Significant Bits (LSB) of the floating-point numbers, while the overall power is adapted according to the communication distance. We define two ranges of communications, short and long, which require only four power levels. This reduces area and power overhead to control the laser output power. A transmission model allows estimating the laser power according to the targeted BER and the number of truncated bits, while the optical network interface allows configuring, at runtime, the number of approximated and truncated bits and the laser output powers. We explore the energy efficiency provided by each communication scheme, and we investigate the error resilience of the benchmarks over several approximation and truncation schemes. The simulation results of ApproxBench applications show that, compared to an interconnect involving only robust communications, approximations in the optical transmission led to up to 53% laser power reduction with a limited degradation at the application level with less than 9% of output error. Finally, we show that our solution is scalable and leads to 10% reduction in the total energy consumption, 35× reduction in the laser driver size, and 10× reduction in the laser controller compared to state-of-the-art solution.

2022 ◽  
Vol 18 (2) ◽  
pp. 1-22
Gokul Krishnan ◽  
Sumit K. Mandal ◽  
Chaitali Chakrabarti ◽  
Jae-Sun Seo ◽  
Umit Y. Ogras ◽  

With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions—one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The increase in connection density increases on-chip data movement, which makes efficient on-chip communication a critical function of the DNN accelerator. The contribution of this work is threefold. First, we illustrate that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement for DNNs. Second, we evaluate P2P and network-on-chip (NoC) interconnect (with a regular topology such as a mesh) for SRAM- and ReRAM-based in-memory computing (IMC) architectures for a range of DNNs. This analysis shows the necessity for the optimal interconnect choice for an IMC DNN accelerator. Finally, we perform an experimental evaluation for different DNNs to empirically obtain the performance of the IMC architecture with both NoC-tree and NoC-mesh. We conclude that, at the tile level, NoC-tree is appropriate for compact DNNs employed at the edge, and NoC-mesh is necessary to accelerate DNNs with high connection density. Furthermore, we propose a technique to determine the optimal choice of interconnect for any given DNN. In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN. We demonstrate that the interconnect optimization in the IMC architecture results in up to 6 × improvement in energy-delay-area product for VGG-19 inference compared to the state-of-the-art ReRAM-based IMC architectures.

2022 ◽  
Vol 15 (2) ◽  
pp. 1-31
Joel Mandebi Mbongue ◽  
Danielle Tchuinkou Kwadjo ◽  
Alex Shuping ◽  
Christophe Bobda

Cloud deployments now increasingly exploit Field-Programmable Gate Array (FPGA) accelerators as part of virtual instances. While cloud FPGAs are still essentially single-tenant, the growing demand for efficient hardware acceleration paves the way to FPGA multi-tenancy. It then becomes necessary to explore architectures, design flows, and resource management features that aim at exposing multi-tenant FPGAs to the cloud users. In this article, we discuss a hardware/software architecture that supports provisioning space-shared FPGAs in Kernel-based Virtual Machine (KVM) clouds. The proposed hardware/software architecture introduces an FPGA organization that improves hardware consolidation and support hardware elasticity with minimal data movement overhead. It also relies on VirtIO to decrease communication latency between hardware and software domains. Prototyping the proposed architecture with a Virtex UltraScale+ FPGA demonstrated near specification maximum frequency for on-chip data movement and high throughput in virtual instance access to hardware accelerators. We demonstrate similar performance compared to single-tenant deployment while increasing FPGA utilization, which is one of the goals of virtualization. Overall, our FPGA design achieved about 2× higher maximum frequency than the state of the art and a bandwidth reaching up to 28 Gbps on 32-bit data width.

2021 ◽  
Vol 35 (1) ◽  
pp. 25-27
Constance L. Milton

The advancement of a healthcare discipline is reliant on the disciplines’ ability to produce rigorous scholarship activities and products. The healthcare disciplines, especially nursing, are facing ever-changing priorities as shortages loom and exhaustion permeates the climate. Empirical public health priorities during the pandemic have dominated professional healthcare literature and global health communications. This article shall offer ethical implications for the discipline of nursing as it seeks the advancement of scholarship. Topics include straight-thinking issues surrounding nursing and medicine national policy statements, the big data movement, and evolutionary return of competency-based nurse education.

2021 ◽  
Vol 14 (1) ◽  
pp. 19
Sakiko Kanbara ◽  
Rajib Shaw

This paper addresses open data, open governance, and disruptive/emerging technologies from the perspectives of disaster risk reduction (DRR). With an in-depth literature review of open governance, the paper identifies five principles for open data adopted in the disaster risk reduction field: (1) open by default, (2) accessible, licensed and documented, (3) co-created, (4) locally owned, and (5) communicated in ways that meet the needs of diverse users. The paper also analyzes the evolution of emerging technologies and their application in Japan. The four-phased evolution in the disaster risk reduction is mentioned as DRR 1.0 (Isewan typhoon, 1959), DRR 2.0 (the Great Hanshin Awaji Earthquake, 1995), DRR 3.0 (the Great East Japan Earthquake and Tsunami: GEJE, 2011) and DRR 4.0 (post GEJE). After the GEJE of 2011, different initiatives have emerged in open data, as well as collaboration/partnership with tech firms for emerging technologies in DRR. This paper analyzes the lessons from the July 2021 landslide in Atami, and draws some lessons based on the above-mentioned five principles. Some of the key lessons for open data movement include characterizing open and usable data, local governance systems, co-creating to co-delivering solutions, data democratization, and interpreting de-segregated data with community engagement. These lessons are useful for outside Japan in terms of data licensing, adaptive governance, stakeholder usage, and community engagement. However, as governance systems are rooted in local decision-making and cultural contexts, some of these lessons need to be customized based on the local conditions. Open governance is still an evolving culture in many countries, and open data is considered as an important tool for that. While there is a trend to develop open data for geo-spatial information, it emerged from the discussion in the paper that it is important to have customized open data for people, wellbeing, health care, and for keeping the balance of data privacy. The evolution of emerging technologies and their usage is proceeding at a higher speed than ever, while the governance system employed to support and use emerging technologies needs time to change and adapt. Therefore, it is very important to properly synchronize and customize open data, open governance and emerging/disruptive technologies for their effective use in disaster risk reduction.

Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3179
Wenxing Chen ◽  
Baojuan Zheng ◽  
Jiaying Liu ◽  
Lianyan Li ◽  
Xiaobin Ren

Elevators are an essential indoor transportation tool in high-rise buildings. The world is advocating the design concept of safety, energy-saving, and intelligence. We focus on improving operation speed and utilization efficiency of the elevator group. This paper proposed a real-time reservation elevator groups optimization algorithm, and a dynamic matrix iterative model has been established. The indoor navigation technology UWB is applied, which can help users to quickly find elevators. The manned equilibrium efficiency and running time equilibrium efficiency of elevator group are given. Moreover, the data filtering criterion formulas for user waiting time and elevator remaining space are defined. In this paper, three numerical examples are given. Example 1 is a single elevator in n-storey building. Example 2 is compared with different scheduling algorithms, such as FCFS, SSTF, LOOK, and SCAN algorithms, and the results show that our method has the advantages of short total running time and less round-trip frequency. At last, the matrix of numerical iteration results are visualized, and the data movement status of people on each floor can be observed. Example 3 introduced elevator group algorithms. For high-rise buildings, this paper adopts a high, medium, and low hierarchical management model; this model has high coordination, as well as fast response, batch process, and adaptive function. Finally, we also discussed and compared the complexity of single elevator and elevator group algorithms. Therefore, this method has great development potential and practical application value, which deserves further study.

2021 ◽  
Yejin Yang ◽  
Juhee Jeon ◽  
Jaemin Son ◽  
Kyoungah Cho ◽  
Sangsig Kim

Abstract The processing of large amounts of data requires a high energy efficiency and fast processing time for high-performance computing systems. However, conventional von Neumann computing systems have performance limitations because of bottlenecks in data movement between separated processing and memory hierarchy, which causes latency and high power consumption. To overcome this hindrance, logic-in-memory (LIM) has been proposed that performs both data processing and memory operations. Here, we present a NAND and NOR LIM composed of silicon nanowidre feedback field-effect transistors, whose configuration resembles that of CMOS logic gate circuits. The LIM can perform memory operations to retain its output logic under zero-bias conditions as well as logic operations with a high processing speed of nanoseconds. The newly proposed dynamic voltage-transfer characteristics verify the operating principle of the LIM. This study demonstrates that the NAND and NOR LIM has promising potential to resolve power and processing speed issues.

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2997
Luminita Hurbean ◽  
Doina Danaiata ◽  
Florin Militaru ◽  
Andrei-Mihail Dodea ◽  
Ana-Maria Negovan

Machine learning (ML) has already gained the attention of the researchers involved in smart city (SC) initiatives, along with other advanced technologies such as IoT, big data, cloud computing, or analytics. In this context, researchers also realized that data can help in making the SC happen but also, the open data movement has encouraged more research works using machine learning. Based on this line of reasoning, the aim of this paper is to conduct a systematic literature review to investigate open data-based machine learning applications in the six different areas of smart cities. The results of this research reveal that: (a) machine learning applications using open data came out in all the SC areas and specific ML techniques are discovered for each area, with deep learning and supervised learning being the first choices. (b) Open data platforms represent the most frequently used source of data. (c) The challenges associated with open data utilization vary from quality of data, to frequency of data collection, to consistency of data, and data format. Overall, the data synopsis as well as the in-depth analysis may be a valuable support and inspiration for the future smart city projects.

Sign in / Sign up

Export Citation Format

Share Document