Low-Complexity Multiplication Using Complement and Signed-Digit Recoding Methods

2014 ◽  
Vol 619 ◽  
pp. 342-346
Author(s):  
Te Jen Chang ◽  
Ping Sheng Huang ◽  
Shan Jen Cheng ◽  
Ching Yin Chen ◽  
I Hui Pan

In this paper, a fast multiplication computing method utilizing the complement representation method and canonical recoding technique is proposed. By performing complements and canonical recoding technique, the number of partial products can be reduced. Based on these techniques, we propose algorithm provides an efficient multiplication method. On average, our proposed algorithm to reduce the number of k-bit additions from (0.25k+logk/k+2.5) to (k/6 +logk/k+2.5), where k is the bit-length of the multiplicand A and multiplier B. We can therefore efficiently speed up the overall performance of the multiplication. Moreover, if we use the new proposes to compute common-multiplicand multiplication, the computational complexity can be reduced from (0.5 k+2 logk/k+5) to (k/3+2 logk/k+5) k-bit additions.

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Yue Weng ◽  
Xi Zhang ◽  
Xiaohu Guo ◽  
Xianwei Zhang ◽  
Yutong Lu ◽  
...  

AbstractIn unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.


Author(s):  
Vladimir Mic ◽  
Pavel Zezula

This chapter focuses on data searching, which is nowadays mostly based on similarity. The similarity search is challenging due to its computational complexity, and also the fact that similarity is subjective and context dependent. The authors assume the metric space model of similarity, defined by the domain of objects and the metric function that measures the dissimilarity of object pairs. The volume of contemporary data is large, and the time efficiency of similarity query executions is essential. This chapter investigates transformations of metric space to Hamming space to decrease the memory and computational complexity of the search. Various challenges of the similarity search with sketches in the Hamming space are addressed, including the definition of sketching transformation and efficient search algorithms that exploit sketches to speed-up searching. The indexing of Hamming space and a heuristic to facilitate the selection of a suitable sketching technique for any given application are also considered.


Author(s):  
Arthur B. Markman

Cognitive psychology identifies different assumptions about the mental representations that form the basis of theories of comparison. Each representation requires a different process to generate a comparison, and both the computational complexity and the output of the different processes differ. Spatial models require a low-complexity process but only reveal the distance between points representing individuals. Featural models are more intensive than spatial comparisons but provide access to particular commonalities and differences. Structural models are more computationally intensive but support a distinction between alignable and nonalignable differences. Social comparison theories make assumptions about how knowledge is represented, but they are rarely explicit about the type of comparison process that is likely to be involved. The merging of work on social comparison with more explicit cognitive science theories of comparison science has the potential to both identify gaps in the literature and expand our knowledge about how comparison operates in social settings. This chapter first discusses the concept of mental representation and then addresses spatial models of comparison, featured models of comparison, structural models of comparison, transformation models. The chapter concludes with a discussion of similarity models and social comparison.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1314
Author(s):  
Taeoh Kim ◽  
Hyobeen Park ◽  
Yunho Jung ◽  
Seongjoo Lee

In this paper, we propose tag sensor using multi-antennas in a Wi-Fi backscatter system, which results in an improved data rate or reliability of the signal transmitted from a tag sensor to a reader. The existing power level modulation method, which is proposed to improve data rate in a Wi-Fi backscatter system, has low reliability due to the reduced distance between symbols. To address this problem, we propose a Wi-Fi backscatter system that obtains channel diversity by applying multiple antennas. Two backscatter methods are described for improving the data rate or reliability in the proposed system. In addition, we propose three low complexity demodulation methods to address the high computational complexity problem caused by multiple antennas: (1) SET (subcarrier energy-based threshold) method, (2) TCST (tag’s channel state-based threshold) method, and (3) SED (similar Euclidean distance) method. In order to verify the performance of the proposed backscatter method and low complexity demodulation schemes, the 802.11 TGn (task group n) channel model was utilized in simulation. In this paper, the proposed tag sensor structure was compared with existing methods using only sub-channels with a large difference in received CSI (channel state information) values or adopting power-level modulation. The proposed scheme showed about 10 dB better bit error rate (BER) performance and throughput. Also, proposed low complexity demodulation schemes were similar in BER performance with a difference of up to 1 dB and the computational complexity was reduced by up to 60% compared to the existing Euclidean distance method.


Author(s):  
Siyu Liao ◽  
Bo Yuan

Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structureimposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor.


Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 980 ◽  
Author(s):  
Hui Feng ◽  
Xiaoqing Zhao ◽  
Zhengquan Li ◽  
Song Xing

In this paper, a novel iterative discrete estimation (IDE) algorithm, which is called the modified IDE (MIDE), is proposed to reduce the computational complexity in MIMO detection in uplink massive MIMO systems. MIDE is a revision of the alternating direction method of multipliers (ADMM)-based algorithm, in which a self-updating method is designed with the damping factor estimated and updated at each iteration based on the Euclidean distance between the iterative solutions of the IDE-based algorithm in order to accelerate the algorithm’s convergence. Compared to the existing ADMM-based detection algorithm, the overall computational complexity of the proposed MIDE algorithm is reduced from O N t 3 + O N r N t 2 to O N t 2 + O N r N t in terms of the number of complex-valued multiplications, where Ntand Nr are the number of users and the number of receiving antennas at the base station (BS), respectively. Simulation results show that the proposed MIDE algorithm performs better in terms of the bit error rate (BER) than some recently-proposed approximation algorithms in MIMO detection of uplink massive MIMO systems.


2017 ◽  
Vol 22 (2) ◽  
pp. 460-472 ◽  
Author(s):  
Weiwei Li ◽  
Wen Chen ◽  
Zhuojia Fu

AbstractThis study makes the first attempt to accelerate the singular boundary method (SBM) by the precorrected-FFT (PFFT) for large-scale three-dimensional potential problems. The SBM with the GMRES solver requires computational complexity, where N is the number of the unknowns. To speed up the SBM, the PFFT is employed to accelerate the SBM matrix-vector multiplication at each iteration step of the GMRES. Consequently, the computational complexity can be reduced to . Several numerical examples are presented to validate the developed PFFT accelerated SBM (PFFT-SBM) scheme, and the results are compared with those of the SBM without the PFFT and the analytical solutions. It is clearly found that the present PFFT-SBM is very efficient and suitable for 3D large-scale potential problems.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Xinhe Zhang ◽  
Yuehua Zhang ◽  
Chang Liu ◽  
Hanzhong Jia

In this paper, the authors propose three low-complexity detection schemes for spatial modulation (SM) systems based on the modified beam search (MBS) detection. The MBS detector, which splits the search tree into some subtrees, can reduce the computational complexity by decreasing the nodes retained in each layer. However, the MBS detector does not take into account the effect of subtree search order on computational complexity, and it does not consider the effect of layers search order on the bit-error-rate (BER) performance. The ost-MBS detector starts the search from the subtree where the optimal solution is most likely to be located, which can reduce total searches of nodes in the subsequent subtrees. Thus, it can decrease the computational complexity. When the number of the retained nodes is fixed, which nodes are retained is very important. That is, the different search orders of layers have a direct influence on BER. Based on this, we propose the oy-MBS detector. The ost-oy-MBS detector combines the detection order of ost-MBS and oy-MBS together. The algorithm analysis and experimental results show that the proposed detectors outstrip MBS with respect to the BER performance and the computational complexity.


2020 ◽  
Vol 10 (15) ◽  
pp. 5051
Author(s):  
Žarko Zečević ◽  
Maja Rolevski

Photovoltaic (PV) modules require maximum power point tracking (MPPT) algorithms to ensure that the amount of power extracted is maximized. In this paper, we propose a low-complexity MPPT algorithm that is based on the neural network (NN) model of the photovoltaic module. Namely, the expression for the output current of the NN model is used to derive the analytical, iterative rules for determining the maximal power point (MPP) voltage and irradiance estimation. In this way, the computational complexity is reduced compared to the other NN-based MPPT methods, in which the optimal voltage is predicted directly from the measurements. The proposed algorithm cannot instantaneously determine the optimal voltage, but it contains a tunable parameter for controlling the trade-off between the tracking speed and computational complexity. Numerical results indicate that the relative error between the actual maximum power and the one obtained by the proposed algorithm is less than 0.1%, which is up to ten times smaller than in the available algorithms.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Van-Khoi Dinh ◽  
Minh-Tuan Le ◽  
Vu-Duc Ngo ◽  
Chi-Hieu Ta

In this paper, a low-complexity linear precoding algorithm based on the principal component analysis technique in combination with the conventional linear precoders, called Principal Component Analysis Linear Precoder (PCA-LP), is proposed for massive MIMO systems. The proposed precoder consists of two components: the first one minimizes the interferences among neighboring users and the second one improves the system performance by utilizing the Principal Component Analysis (PCA) technique. Numerical and simulation results show that the proposed precoder has remarkably lower computational complexity than its low-complexity lattice reduction-aided regularized block diagonalization using zero forcing precoding (LC-RBD-LR-ZF) and lower computational complexity than the PCA-aided Minimum Mean Square Error combination with Block Diagonalization (PCA-MMSE-BD) counterparts while its bit error rate (BER) performance is comparable to those of the LC-RBD-LR-ZF and PCA-MMSE-BD ones.


Sign in / Sign up

Export Citation Format

Share Document