Changing computing paradigms towards power efficiency

Power awareness is fast becoming immensely important in computing, ranging from the traditional high-performance computing applications to the new generation of data centric workloads. In this work, we describe our efforts towards a power-efficient computing paradigm that combines low- and high-precision arithmetic. We showcase our ideas for the widely used kernel of solving systems of linear equations that finds numerous applications in scientific and engineering disciplines as well as in large-scale data analytics, statistics and machine learning. Towards this goal, we developed tools for the seamless power profiling of applications at a fine-grain level. In addition, we verify here previous work on post-FLOPS/W metrics and show that these can shed much more light in the power/energy profile of important applications.

Download Full-text

Large-scale, power-efficient Au/VO2 active metasurfaces for ultrafast optical modulation

Nanophotonics ◽

10.1515/nanoph-2020-0354 ◽

2020 ◽

Vol 10 (2) ◽

pp. 909-918

Author(s):

Tongtong Kang ◽

Zongwei Ma ◽

Jun Qin ◽

Zheng Peng ◽

Weihao Yang ◽

...

Keyword(s):

Self Assembly ◽

Power Efficiency ◽

High Performance ◽

Optical Switching ◽

Large Scale ◽

Colloidal Crystal ◽

Resonance Wavelength ◽

Optical Modulation ◽

Power Efficient ◽

Ultrafast Optical

AbstractActive metasurfaces, in which the optical property of a metasurface device can be controlled by external stimuli, have attracted great research interest recently. For optical switching and modulation applications, high-performance active metasurfaces need to show high transparency, high power efficiency, as well as ultrafast switching and large-scale fabrication capability. This paper reports Au/VO2-based active metasurfaces meeting the requirements above. Centimeter-scale Au/VO2 metasurfaces are fabricated by polystyrene sphere colloidal crystal self-assembly. The devices show optical modulation on-off ratio up to 12.7 dB and insertion loss down to 3.3 dB at 2200 nm wavelength in the static heating experiment, and ΔT/T of 10% in ultrafast pump-probe experiments. In particular, by judiciously aligning the surface plasmon resonance wavelength to the pump wavelength of the femtosecond laser, the enhanced electric field at 800 nm is capable to switch off the extraordinary optical transmission effect at 2200 nm in 100 fs time scale. Compared to VO2 thin-film samples, the devices also show 50% power reduction for all-optical modulation. Our work provides a practical way to fabricate large-scale and power-efficient active metasurfaces for ultrafast optical modulation.

Download Full-text

Editorial: Linking experimental and computational connectomics

Network Neuroscience ◽

10.1162/netn_e_00108 ◽

2019 ◽

Vol 3 (4) ◽

pp. 902-904

Author(s):

Alexander Peyser ◽

Sandra Diaz Pier ◽

Wouter Klijn ◽

Abigail Morrison ◽

Jochen Triesch

Keyword(s):

High Performance Computing ◽

In Silico ◽

High Performance ◽

Large Scale ◽

Generative Models ◽

Anatomical Structure ◽

Global Function ◽

Theoretical Neuroscience ◽

New Generation ◽

Performance Computing

Large-scale in silico experimentation depends on the generation of connectomes beyond available anatomical structure. We suggest that linking research across the fields of experimental connectomics, theoretical neuroscience, and high-performance computing can enable a new generation of models bridging the gap between biophysical detail and global function. This Focus Feature on ”Linking Experimental and Computational Connectomics” aims to bring together some examples from these domains as a step toward the development of more comprehensive generative models of multiscale connectomes.

Download Full-text

Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

10.26686/wgtn.17071976 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

<p>Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High-Performance Computer cluster, or more recently, within the cloud. Commercial clouds have increasingly become a viable platform for hosting scientific analyses and computation due to their elasticity, recent introduction of specialist hardware, and pay-as-you-go cost model. This computing paradigm therefore presents a low capital and low barrier alternative to operating dedicated eScience infrastructure. Indeed, commercial clouds now enable universal access to capabilities previously available to only large well funded research groups. While the potential benefits of cloud computing are clear, there are still significant technical hurdles associated with obtaining the best execution efficiency whilst trading off cost. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling becomes an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types and the vast array of services. This mapping of workflow tasks onto a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, certain runtime constraints, the most typical being the cost of the computation and the time which that computation requires to complete, must be met. This thesis addresses 'the scientific workflow scheduling problem in cloud', which is to schedule workflow tasks on cloud resources in a way that users meet their defined constraints such as budget and deadline, and providers maximize profits and resource utilization. Moreover, it explores different mechanisms and strategies for distributing defined constraints over a workflow and investigate its impact on the overall cost of the resulting schedule.</p>

Download Full-text

Automated Clustering of Virtual Machines based on Correlation of Resource Usage

Journal of Communications Software and Systems ◽

10.24138/jcomss.v8i4.164 ◽

2012 ◽

Vol 8 (4) ◽

pp. 102 ◽

Cited By ~ 7

Author(s):

Claudia Canali ◽

Riccardo Lancellotti

Keyword(s):

Cloud Computing ◽

High Performance ◽

Large Scale ◽

Virtual Machines ◽

Resource Usage ◽

Cloud Data ◽

Multiple Resources ◽

Computing Paradigm ◽

Innovative Methodology ◽

Cloud Data Centers

The recent growth in demand for modern applicationscombined with the shift to the Cloud computing paradigm have led to the establishment of large-scale cloud data centers. The increasing size of these infrastructures represents a major challenge in terms of monitoring and management of the system resources. Available solutions typically consider every Virtual Machine (VM) as a black box each with independent characteristics, and face scalability issues by reducing the number of monitored resource samples, considering in most cases only average CPU usage sampled at a coarse time granularity. We claim that scalability issues can be addressed by leveraging thesimilarity between VMs in terms of resource usage patterns.In this paper we propose an automated methodology to cluster VMs depending on the usage of multiple resources, both systemand network-related, assuming no knowledge of the services executed on them. This is an innovative methodology that exploits the correlation between the resource usage to cluster together similar VMs. We evaluate the methodology through a case study with data coming from an enterprise datacenter, and we show that high performance may be achieved in automatic VMs clustering. Furthermore, we estimate the reduction in the amount of data collected, thus showing that our proposal may simplify the monitoring requirements and help administrators totake decisions on the resource management of cloud computing datacenters.

Download Full-text

A new generation of high performance large-scale and flexible thermo-generators based on (Bi,Sb)2 (Te,Se)3 nano-powders using the Spark Plasma Sintering technique

Sensors and Actuators A Physical ◽

10.1016/j.sna.2011.11.011 ◽

2012 ◽

Vol 174 ◽

pp. 115-122 ◽

Cited By ~ 16

Author(s):

G. Delaizir ◽

J. Monnier ◽

M. Soulier ◽

R. Grodzki ◽

B. Villeroy ◽

...

Keyword(s):

Spark Plasma Sintering ◽

High Performance ◽

Large Scale ◽

Plasma Sintering ◽

Spark Plasma ◽

New Generation

Download Full-text

Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure

Journal Of Big Data ◽

10.1186/s40537-020-00361-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

E. A. Huerta ◽

Asad Khan ◽

Edward Davis ◽

Colleen Bushell ◽

William D. Gropp ◽

...

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Social Patterns ◽

Computing Paradigm ◽

Recent Developments ◽

Optimization Schemes ◽

Performance Computing

Abstract Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.

Download Full-text

Design Space Exploration of High-Performance Parallel Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v3i1.279 ◽

2008 ◽

Vol 3 (1) ◽

pp. 32-38

Author(s):

Enric Musoll ◽

Mario Nemirovsky

Keyword(s):

Power Efficiency ◽

High Performance ◽

Design Space Exploration ◽

Parallel Architecture ◽

Parallel Architectures ◽

Power Performance ◽

Power Budget ◽

Performance Goal ◽

Power Efficient ◽

On Chip

High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.

Download Full-text

A MapReduce Clone Car Identification Model over Traffic Data Stream

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.346.117 ◽

2013 ◽

Vol 346 ◽

pp. 117-122

Author(s):

Wen Chuan Yang ◽

Guang Jie Lin ◽

Jiang Yong Wang

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Image Data ◽

Traffic Information ◽

Identification System ◽

Traffic Data ◽

Traffic Surveillance ◽

Fine Grain ◽

Large Scale Dataset

Accompany the widely use of Intelligent Traffic in China, all traffic input data streams to the Traffic Surveillance Center (TSC). Some metropolitan TSC, such as in Beijing, produces up to 18 million records and 1T image data arriving every hour. Normally, the job of the TSC is to monitor and retain data. There is a tendency to put more capability into the TSC, such as ad-hoc query for clone car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose CarMR, a MapReduce Clone Car Identification system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in CarMR for fast data sharing and query. CarMR supports fast locating clone car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text

High performance computing for flood simulation using Telemac based on hybrid MPI/OpenMP parallel programming

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962314720015 ◽

2014 ◽

Vol 05 (04) ◽

pp. 1472001 ◽

Cited By ~ 5

Author(s):

Zhi Shang

Keyword(s):

Parallel Computing ◽

Parallel Programming ◽

High Performance ◽

Large Scale ◽

Flood Simulation ◽

Hybrid Programming ◽

Fine Grain ◽

Domain Partitioning ◽

Performance Computing ◽

Parallel Technique

Usually simulations on environment flood issues will face the scalability problem of large scale parallel computing. The plain parallel technique based on pure MPI is difficult to have a good scalability due to the large number of domain partitioning. Therefore, the hybrid programming using MPI and OpenMP is introduced to deal with the issue of scalability. This kind of parallel technique can give a full play to the strengths of MPI and OpenMP. During the parallel computing, OpenMP is employed by its efficient fine grain parallel computing and MPI is used to perform the coarse grain parallel domain partitioning for data communications. Through the tests, the hybrid MPI/OpenMP parallel programming was used to renovate the finite element solvers in the BIEF library of Telemac. It was found that the hybrid programming is able to provide helps for Telemac to deal with the scalability issue.

Download Full-text

Research of a MapReduce Model to Process the Traffic Big Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.548-549.1853 ◽

2014 ◽

Vol 548-549 ◽

pp. 1853-1856 ◽

Cited By ~ 1

Author(s):

Wen Chuan Yang ◽

He Chen ◽

Qing Yi Qu

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Processing System ◽

Traffic Information ◽

Traffic Data ◽

Fine Grain ◽

Large Scale Dataset ◽

Data Processing Center ◽

Mapreduce Model

Normally, the job of the Traffic Data Processing Center (TDPC) is to monitor and retain data. There is a tendency to put more capability into the TDPC, such as ad-hoc query for speeding car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose MRTP, a MapReduce Traffic Processing system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in MRTP for fast data sharing and query. MRTP supports fast locating speeding car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text