scholarly journals Outperforming Sequential Full-Word Long Addition with Parallelization and Vectorization

Author(s):  
Andrey Chusov
Keyword(s):  

The paper presents algorithms for parallel and vectorized full-word addition of big unsigned integers with carry propagation. Because of the propagation, software parallelization and vectorization of non-polynomial addition of big integers have long been considered impractical due to data dependencies between digits of the operands. The presented algorithms are based upon parallel and vectorized detection of carry origins within elements of vector operands, masking bits which correspond to those elements and subsequent scalar addition of the resulting integers. The acquired bits can consequently be taken into account to adjust the sum using the Kogge-Stone method.<br>Essentially, the paper formalizes and experimentally verifies parallel and vectorized implementation of carry-lookahead adders applied at arbitrary granularity of data. This approach is noticeably beneficial for manycore, CUDA and vectorized implementation using AVX-512 with masked instructions.

Micromachines ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 183
Author(s):  
Jose Ricardo Gomez-Rodriguez ◽  
Remberto Sandoval-Arechiga ◽  
Salvador Ibarra-Delgado ◽  
Viktor Ivan Rodriguez-Abdala ◽  
Jose Luis Vazquez-Avila ◽  
...  

Current computing platforms encourage the integration of thousands of processing cores, and their interconnections, into a single chip. Mobile smartphones, IoT, embedded devices, desktops, and data centers use Many-Core Systems-on-Chip (SoCs) to exploit their compute power and parallelism to meet the dynamic workload requirements. Networks-on-Chip (NoCs) lead to scalable connectivity for diverse applications with distinct traffic patterns and data dependencies. However, when the system executes various applications in traditional NoCs—optimized and fixed at synthesis time—the interconnection nonconformity with the different applications’ requirements generates limitations in the performance. In the literature, NoC designs embraced the Software-Defined Networking (SDN) strategy to evolve into an adaptable interconnection solution for future chips. However, the works surveyed implement a partial Software-Defined Network-on-Chip (SDNoC) approach, leaving aside the SDN layered architecture that brings interoperability in conventional networking. This paper explores the SDNoC literature and classifies it regarding the desired SDN features that each work presents. Then, we described the challenges and opportunities detected from the literature survey. Moreover, we explain the motivation for an SDNoC approach, and we expose both SDN and SDNoC concepts and architectures. We observe that works in the literature employed an uncomplete layered SDNoC approach. This fact creates various fertile areas in the SDNoC architecture where researchers may contribute to Many-Core SoCs designs.


Author(s):  
Chafik Arar ◽  
Mohamed Salah Khireddine

The paper proposes a new reliable fault-tolerant scheduling algorithm for real-time embedded systems. The proposed algorithm is based on static scheduling that allows to include the dependencies and the execution cost of tasks and data dependencies in its scheduling decisions. Our scheduling algorithm is dedicated to multi-bus heterogeneous architectures with multiple processors linked by several shared buses. This scheduling algorithm is considering only one bus fault caused by hardware faults and compensated by software redundancy solutions. The proposed algorithm is based on both active and passive backup copies to minimize the scheduling length of data on buses. In the experiments, the proposed methods are evaluated in terms of data scheduling length for a set of DSP benchmarks. The experimental results show the effectiveness of our technique.


Author(s):  
Srikanth M. Kannapan ◽  
Dean L. Taylor

Abstract Naive interpretations of concurrent engineering may expect extreme parallelization of tasks and simultaneous accommodation of multiple perspectives. In fact, from our efforts at modeling tasks in a MEMS (Micro-Electro-Mechanical Systems) pressure sensor design project, it appears that data dependencies due to the structure of tasks and the product itself result in scenarios of decision and action that must be carefully coordinated. This paper refines a previously described information model for defining evolving contexts of product model aspects and team member perspectives, with software agents acting on behalf of team members to execute tasks. The pressure sensor design project is analyzed in the framework of the information model. A scenario of decision and action for design of the pressure sensor is modeled as a design process plan. Conflict on a shared parameter occurs as a consequence of introducing some parallelism between the capacitance and deflection agents in the process. We present a technique for negotiating such conflicts by definition and propagation of utility functions on decision parameters and axiomatic negotiation.


2009 ◽  
Vol 27 (4) ◽  
pp. 1377-1386 ◽  
Author(s):  
M. Antón ◽  
D. Loyola ◽  
M. López ◽  
J. M. Vilaplana ◽  
M. Bañón ◽  
...  

Abstract. The main objective of this article is to compare the total ozone data from the new Global Ozone Monitoring Experiment instrument (GOME-2/MetOp) with reliable ground-based measurement recorded by five Brewer spectroradiometers in the Iberian Peninsula. In addition, a similar comparison for the predecessor instrument GOME/ERS-2 is described. The period of study is a whole year from May 2007 to April 2008. The results show that GOME-2/MetOp ozone data already has a very good quality, total ozone columns are on average 3.05% lower than Brewer measurements. This underestimation is higher than that obtained for GOME/ERS-2 (1.46%). However, the relative differences between GOME-2/MetOp and Brewer measurements show significantly lower variability than the differences between GOME/ERS-2 and Brewer data. Dependencies of these relative differences with respect to the satellite solar zenith angle (SZA), the satellite scan angle, the satellite cloud cover fraction (CF), and the ground-based total ozone measurements are analyzed. For both GOME instruments, differences show no significant dependence on SZA. However, GOME-2/MetOp data show a significant dependence on the satellite scan angle (+1.5%). In addition, GOME/ERS-2 differences present a clear dependence with respect to the CF and ground-based total ozone; such differences are minimized for GOME-2/MetOp. The comparison between the daily total ozone values provided by both GOME instruments shows that GOME-2/MetOp ozone data are on average 1.46% lower than GOME/ERS-2 data without any seasonal dependence. Finally, deviations of a priori climatological ozone profile used by the satellite retrieval algorithm from the true ozone profile are analyzed. Although excellent agreement between a priori climatological and measured partial ozone values is found for the middle and high stratosphere, relative differences greater than 15% are common for the troposphere and lower stratosphere.


2020 ◽  
Vol 23 (3) ◽  
pp. 473-493
Author(s):  
Nikita Andreevich Kataev ◽  
Alexander Andreevich Smirnov ◽  
Andrey Dmitrievich Zhukov

The use of pointers and indirect memory accesses in the program, as well as the complex control flow are some of the main weaknesses of the static analysis of programs. The program properties investigated by this analysis are too conservative to accurately describe program behavior and hence they prevent parallel execution of the program. The application of dynamic analysis allows us to expand the capabilities of semi-automatic parallelization. In the SAPFOR system (System FOR Automated Parallelization), a dynamic analysis tool has been implemented, based on on the instrumentation of the LLVM representation of an analyzed program, which allows the system to explore programs in both C and Fortran programming languages. The capabilities of the static analysis implemented in SAPFOR are used to reduce the overhead program execution, while maintaining the completeness of the analysis. The use of static analysis allows to reduce the number of analyzed memory accesses and to ignore scalar variables, which can be explored in a static way. The developed tool was tested on performance tests from the NAS Parallel Benchmarks package for C and Fortran languages. The implementation of dynamic analysis, in addition to traditional types of data dependencies (flow, anit, output), allows us to determine privitizable variables and a possibility of pipeline execution of loops. Together with the capabilities of DVM and OpenMP these greatly facilitates program parallelization and simplify insertion of the appropriate compiler directives.


2020 ◽  
Author(s):  
Sudad H Al-Obaidi ◽  
Galkin AP

Knowledge of the properties of reservoir oil is necessary when calculating reserves, creating projects development, creating hydrodynamic models of development objects. Reservoir oil properties are determined by downhole samples taken, as usual, from exploration and production wells. In some cases, it is impossible to create conditions for the selection of high-quality downhole samples at exploration and production wells. In such cases, we must use samples of surface oil to obtain information about the reservoir properties of this oil. In this work and as a result of the analysis of the accumulated data, dependencies with a high degree of correlation were obtained, which make it possible to quickly assess the expected parameters of reservoir oil, having only the density of surface oil.


Sign in / Sign up

Export Citation Format

Share Document