parallel architecture
Recently Published Documents


TOTAL DOCUMENTS

1075
(FIVE YEARS 117)

H-INDEX

32
(FIVE YEARS 4)

2022 ◽  
Vol 18 (2) ◽  
pp. 1-25
Author(s):  
Saransh Gupta ◽  
Mohsen Imani ◽  
Joonseop Sim ◽  
Andrew Huang ◽  
Fan Wu ◽  
...  

Stochastic computing (SC) reduces the complexity of computation by representing numbers with long streams of independent bits. However, increasing performance in SC comes with either an increase in area or a loss in accuracy. Processing in memory (PIM) computes data in-place while having high memory density and supporting bit-parallel operations with low energy consumption. In this article, we propose COSMO, an architecture for co mputing with s tochastic numbers in me mo ry, which enables SC in memory. The proposed architecture is general and can be used for a wide range of applications. It is a highly dense and parallel architecture that supports most SC encodings and operations in memory. It maximizes the performance and energy efficiency of SC by introducing several innovations: (i) in-memory parallel stochastic number generation, (ii) efficient implication-based logic in memory, (iii) novel memory bit line segmenting, (iv) a new memory-compatible SC addition operation, and (v) enabling flexible block allocation. To show the generality and efficiency of our stochastic architecture, we implement image processing, deep neural networks (DNNs), and hyperdimensional (HD) computing on the proposed hardware. Our evaluations show that running DNN inference on COSMO is 141× faster and 80× more energy efficient as compared to GPU.


2022 ◽  
Author(s):  
Nelson Kingsley Joel Peter Thiagarajan ◽  
Vijeyakumar K N ◽  
Saravanakumar S

Abstract Approximate computing is a modern techniques for design of low power efficient arithmetic circuits for portable error resilient applications. In this work, we have proposed a Adaptive Parallel Mid-Point Filter (APMPF) architecture using proposed imprecise Max-Min Estimator (MME)targeting digital image processing. Parallel architecture for the MME can trade-off hardware at the expense of accuracy are proposed and used in the proposed APMPF. In APMPF, we use three level of sorting to estimate the mid-point of 3 x 3 window. Switching based trimmed filter is proposed for precise estimation of the selected window. Experimental Results interms of Area, Power and Delay with 90nm ASIC technology exposed that to the least, Proposed filters demonstrate 7% and 9% Area Delay Product (ADP) and Power Delay Product (PDP) reductions, respectively, compared to precise filter design.


2022 ◽  
Vol 4 ◽  
Author(s):  
Neil Cohn ◽  
Joost Schilperoord

Language is typically embedded in multimodal communication, yet models of linguistic competence do not often incorporate this complexity. Meanwhile, speech, gesture, and/or pictures are each considered as indivisible components of multimodal messages. Here, we argue that multimodality should not be characterized by whole interacting behaviors, but by interactions of similar substructures which permeate across expressive behaviors. These structures comprise a unified architecture and align within Jackendoff's Parallel Architecture: a modality, meaning, and grammar. Because this tripartite architecture persists across modalities, interactions can manifest within each of these substructures. Interactions between modalities alone create correspondences in time (ex. speech with gesture) or space (ex. writing with pictures) of the sensory signals, while multimodal meaning-making balances how modalities carry “semantic weight” for the gist of the whole expression. Here we focus primarily on interactions between grammars, which contrast across two variables: symmetry, related to the complexity of the grammars, and allocation, related to the relative independence of interacting grammars. While independent allocations keep grammars separate, substitutive allocation inserts expressions from one grammar into those of another. We show that substitution operates in interactions between all three natural modalities (vocal, bodily, graphic), and also in unimodal contexts within and between languages, as in codeswitching. Altogether, we argue that unimodal and multimodal expressions arise as emergent interactive states from a unified cognitive architecture, heralding a reconsideration of the “language faculty” itself.


Author(s):  
Heru Dibyo Laksono ◽  
Novizon Novizon ◽  
Melda Latif ◽  
Eko Amri Gunawan ◽  
Reri Afrianita

This journal describes the design and analysis of the response of a single controller and cascade direct current type of Automatic Voltage Regulator (AVR) system. The direct current AVR system is represented form of a transfer function. For single and cascade controllers, it is designed using a parallel architecture using MATLAB software with predetermined design criteria. The types of controllers used consist of Proportional Differential (PD), Proportional Integral (PI), Proportional Integral Differential (PID), Proportional Differential with First Order Filters in the Differential Section (PDF) and Proportional Integral Differentials with First Order Filters in the Differential Section(PIDF). For the transition analysis, the observed parameters consist of rise time, peak time, steady state time, maximum pass value and peak value. The results of the analysis show that the controllers that meet the design criteria are Proportional Differential (PD) controllers and Proportional Differential controllers with First Order Filters in Differential Sections (PDF) for single controllers and cascade controllers. For a single controller, the value of the Proportional constant (Kp) is 0.6280 and the value of the Differential constant (KD) is 0.1710 for the Proportional Differential (PD) controller. Proportional constant value (Kp) is 0.6130, Differential constant value (KD) is 0.1710 and filter constant value (Tf) is 0.0009 for Proportional Differential controller with First Order Filter in Differential Section (PDF). Cascade controllers and Proportional Differential (PD) controllers, the Proportional constant (Kp) is 1.7300 and the Differential constant (KD) is 0.0242 for the inner circle (C2). Outer ring controller (C1), the proportional constant (Kp) is 179,000 and the Differential constant (KD) is 2.4600. Cascade controllers and Proportional Differential controller types with First Order Filters in the Differential Section (PDF), the Proportional constant (Kp) value is 1.5900, the Differential constant (KD) value is 0.0246, the filter constant value (Tf) is 0.0018 for the inner circumference (C2 ). For the outer ring controller (C1), the Proportional constant (Kp) value is 134,0000, the Differential constant (KD) value is 2.2900 and the filter constant value (Tf) is 0.00008.


Mathematics ◽  
2021 ◽  
Vol 9 (24) ◽  
pp. 3278
Author(s):  
Petr Pařík ◽  
Jin-Gyun Kim ◽  
Martin Isoz ◽  
Chang-uk Ahn

The enhanced Craig–Bampton (ECB) method is a novel extension of the original Craig–Bampton (CB) method, which has been widely used for component mode synthesis (CMS). The ECB method, using residual modal compensation that is neglected in the CB method, provides dramatic accuracy improvement of reduced matrices without an increasing number of eigenbasis. However, it also needs additional computational requirements to treat the residual flexibility. In this paper, an efficient parallelization of the ECB method is presented to handle this issue and accelerate the applicability for large-scale structural vibration problems. A new ECB formulation within a substructuring strategy is derived to achieve better scalability. The parallel implementation is based on OpenMP parallel architecture. METIS graph partitioning and Linear Algebra Package (LAPACK) are used to automated algebraic partitioning and computational linear algebra, respectively. Numerical examples are presented to evaluate the accuracy, scalability, and capability of the proposed parallel ECB method. Consequently, based on this work, one can expect effective computation of the ECB method as well as accuracy improvement.


SPE Journal ◽  
2021 ◽  
pp. 1-20
Author(s):  
A. M. Manea ◽  
T. Almani

Summary In this work, the scalability of two key multiscale solvers for the pressure equation arising from incompressible flow in heterogeneous porous media, namely, the multiscale finite volume (MSFV) solver, and the restriction-smoothed basis multiscale (MsRSB) solver, are investigated on the graphics processing unit (GPU) massively parallel architecture. The robustness and scalability of both solvers are compared against their corresponding carefully optimized implementation on the shared-memory multicore architecture in a structured problem setting. Although several components in MSFV and MsRSB algorithms are directly parallelizable, their scalability on the GPU architecture depends heavily on the underlying algorithmic details and data-structure design of every step, where one needs to ensure favorable control and data flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of algorithm chosen for each step greatly influences the overall robustness of the solver. Thus, we extend the work on the parallel multiscale methods of Manea et al. (2016) to map the MSFV and MsRSB special kernels to the massively parallel GPU architecture. The scalability of our optimized parallel MSFV and MsRSB GPU implementations are demonstrated using highly heterogeneous structured 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. For both solvers, the multicore implementations are benchmarked on a shared-memory multicore architecture consisting of two packages of Intel® Cascade Lake Xeon Gold 6246 central processing unit (CPU), whereas the GPU implementations are benchmarked on a massively parallel architecture consisting of NVIDIA Volta V100 GPUs. We compare the multicore implementations to the GPU implementations for both the setup and solution stages. Finally, we compare the parallel MsRSB scalability to the scalability of MSFV on the multicore (Manea et al. 2016) and GPU architectures. To the best of our knowledge, this is the first parallel implementation and demonstration of these versatile multiscale solvers on the GPU architecture. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Conference Special Issue.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7933
Author(s):  
António Silva ◽  
Duarte Fernandes ◽  
Rafael Névoa ◽  
João Monteiro ◽  
Paulo Novais ◽  
...  

Research about deep learning applied in object detection tasks in LiDAR data has been massively widespread in recent years, achieving notable developments, namely in improving precision and inference speed performances. These improvements have been facilitated by powerful GPU servers, taking advantage of their capacity to train the networks in reasonable periods and their parallel architecture that allows for high performance and real-time inference. However, these features are limited in autonomous driving due to space, power capacity, and inference time constraints, and onboard devices are not as powerful as their counterparts used for training. This paper investigates the use of a deep learning-based method in edge devices for onboard real-time inference that is power-effective and low in terms of space-constrained demand. A methodology is proposed for deploying high-end GPU-specific models in edge devices for onboard inference, consisting of a two-folder flow: study model hyperparameters’ implications in meeting application requirements; and compression of the network for meeting the board resource limitations. A hybrid FPGA-CPU board is proposed as an effective onboard inference solution by comparing its performance in the KITTI dataset with computer performances. The achieved accuracy is comparable to the PC-based deep learning method with a plus that it is more effective for real-time inference, power limited and space-constrained purposes.


2021 ◽  
Author(s):  
Abdulrahman Manea

Abstract Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.


Sign in / Sign up

Export Citation Format

Share Document