scholarly journals Reducing Underflow in Mixed Precision Training by Gradient Scaling

Author(s):  
Ruizhe Zhao ◽  
Brian Vogel ◽  
Tanvir Ahmed ◽  
Wayne Luk

By leveraging the half-precision floating-point format (FP16) well supported by recent GPUs, mixed precision training (MPT) enables us to train larger models under the same or even smaller budget. However, due to the limited representation range of FP16, gradients can often experience severe underflow problems that hinder backpropagation and degrade model accuracy. MPT adopts loss scaling, which scales up the loss value just before backpropagation starts, to mitigate underflow by enlarging the magnitude of gradients. Unfortunately, scaling once is insufficient: gradients from distinct layers can each have different data distributions and require non-uniform scaling. Heuristics and hyperparameter tuning are needed to minimize these side-effects on loss scaling. We propose gradient scaling, a novel method that analytically calculates the appropriate scale for each gradient on-the-fly. It addresses underflow effectively without numerical problems like overflow and the need for tedious hyperparameter tuning. Experiments on a variety of networks and tasks show that gradient scaling can improve accuracy and reduce overall training effort compared with the state-of-the-art MPT.

2021 ◽  
Vol 14 (4) ◽  
pp. 1-28
Author(s):  
Tao Yang ◽  
Zhezhi He ◽  
Tengchuan Kou ◽  
Qingzheng Li ◽  
Qi Han ◽  
...  

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.


2019 ◽  
Author(s):  
Oriol Tintó Prims ◽  
Mario C. Acosta ◽  
Andrew M. Moore ◽  
Miguel Castrillo ◽  
Kim Serradell ◽  
...  

Abstract. Mixed-precision approaches can provide substantial speed-ups for both computing- and memory-bound codes requiring little effort. Most scientific codes have overengineered the numerical precision leading to a situation where models are using more resources than required without having a clue about where these resources are unnecessary and where are really needed. Consequently, there is the possibility to obtain performance benefits from using a more appropriate choice of precision and the only thing that is needed is a method to determine which real variables can be represented with fewer bits without affecting the accuracy of the results. This paper presents a novel method to enable modern and legacy codes to benefit from a reduction of precision without sacrificing accuracy. It consists in a simple idea: if we can measure how reducing the precision of a group of variables affects the outputs, we can evaluate the level of precision this group of variables need. Modifying and recompiling the code for each case that has to be evaluated would require an amount of effort that makes this task prohibitive. Instead, the method presented in this paper relies on the use of a tool called Reduced Precision Emulator (RPE) that can significantly streamline the process . Using the RPE and a list of parameters containing the precisions that will be used for each real variable in the code, it is possible within a single binary to emulate the effect on the outputs of a specific choice of precision. Once we have the potential of emulating the effects of reduced precision, we can proceed with the design of the tests required to obtain knowledge about all the variables in the model. The number of possible combinations is prohibitively large and impossible to explore. The alternative of performing a screening of the variables individually can give certain insight about the precision needed by the variables, but on the other hand some more complex interactions that involve several variables may remain hidden. Instead, we use a divide-and-conquer algorithm that identifies the parts that cannot handle reduced precision and builds a set of variables that can. The method has been put to proof using two state-of-the-art ocean models, NEMO and ROMS, with very promising results. Obtaining this information is crucial to build afterwards an actual mixed precision version of the code that will bring the promised performance benefits.


2019 ◽  
Vol 19 (25) ◽  
pp. 2348-2356 ◽  
Author(s):  
Neng-Zhong Xie ◽  
Jian-Xiu Li ◽  
Ri-Bo Huang

Acetoin is an important four-carbon compound that has many applications in foods, chemical synthesis, cosmetics, cigarettes, soaps, and detergents. Its stereoisomer (S)-acetoin, a high-value chiral compound, can also be used to synthesize optically active drugs, which could enhance targeting properties and reduce side effects. Recently, considerable progress has been made in the development of biotechnological routes for (S)-acetoin production. In this review, various strategies for biological (S)- acetoin production are summarized, and their constraints and possible solutions are described. Furthermore, future prospects of biological production of (S)-acetoin are discussed.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Aysen Degerli ◽  
Mete Ahishali ◽  
Mehmet Yamac ◽  
Serkan Kiranyaz ◽  
Muhammad E. H. Chowdhury ◽  
...  

AbstractComputer-aided diagnosis has become a necessity for accurate and immediate coronavirus disease 2019 (COVID-19) detection to aid treatment and prevent the spread of the virus. Numerous studies have proposed to use Deep Learning techniques for COVID-19 diagnosis. However, they have used very limited chest X-ray (CXR) image repositories for evaluation with a small number, a few hundreds, of COVID-19 samples. Moreover, these methods can neither localize nor grade the severity of COVID-19 infection. For this purpose, recent studies proposed to explore the activation maps of deep networks. However, they remain inaccurate for localizing the actual infestation making them unreliable for clinical use. This study proposes a novel method for the joint localization, severity grading, and detection of COVID-19 from CXR images by generating the so-called infection maps. To accomplish this, we have compiled the largest dataset with 119,316 CXR images including 2951 COVID-19 samples, where the annotation of the ground-truth segmentation masks is performed on CXRs by a novel collaborative human–machine approach. Furthermore, we publicly release the first CXR dataset with the ground-truth segmentation masks of the COVID-19 infected regions. A detailed set of experiments show that state-of-the-art segmentation networks can learn to localize COVID-19 infection with an F1-score of 83.20%, which is significantly superior to the activation maps created by the previous methods. Finally, the proposed approach achieved a COVID-19 detection performance with 94.96% sensitivity and 99.88% specificity.


Author(s):  
Wei-Fan Chiang ◽  
Mark Baranowski ◽  
Ian Briggs ◽  
Alexey Solovyev ◽  
Ganesh Gopalakrishnan ◽  
...  

Author(s):  
Mingliang Xu ◽  
Qingfeng Li ◽  
Jianwei Niu ◽  
Hao Su ◽  
Xiting Liu ◽  
...  

Quick response (QR) codes are usually scanned in different environments, so they must be robust to variations in illumination, scale, coverage, and camera angles. Aesthetic QR codes improve the visual quality, but subtle changes in their appearance may cause scanning failure. In this article, a new method to generate scanning-robust aesthetic QR codes is proposed, which is based on a module-based scanning probability estimation model that can effectively balance the tradeoff between visual quality and scanning robustness. Our method locally adjusts the luminance of each module by estimating the probability of successful sampling. The approach adopts the hierarchical, coarse-to-fine strategy to enhance the visual quality of aesthetic QR codes, which sequentially generate the following three codes: a binary aesthetic QR code, a grayscale aesthetic QR code, and the final color aesthetic QR code. Our approach also can be used to create QR codes with different visual styles by adjusting some initialization parameters. User surveys and decoding experiments were adopted for evaluating our method compared with state-of-the-art algorithms, which indicates that the proposed approach has excellent performance in terms of both visual quality and scanning robustness.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1517
Author(s):  
Xinsheng Wang ◽  
Xiyue Wang

True random number generators (TRNGs) have been a research hotspot due to secure encryption algorithm requirements. Therefore, such circuits are necessary building blocks in state-of-the-art security controllers. In this paper, a TRNG based on random telegraph noise (RTN) with a controllable rate is proposed. A novel method of noise array circuits is presented, which consists of digital decoder circuits and RTN noise circuits. The frequency of generating random numbers is controlled by the speed of selecting different gating signals. The results of simulation show that the array circuits consist of 64 noise source circuits that can generate random numbers by a frequency from 1 kHz to 16 kHz.


2022 ◽  
Vol 29 (2) ◽  
pp. 1-33
Author(s):  
Nigel Bosch ◽  
Sidney K. D'Mello

The ability to identify whether a user is “zoning out” (mind wandering) from video has many HCI (e.g., distance learning, high-stakes vigilance tasks). However, it remains unknown how well humans can perform this task, how they compare to automatic computerized approaches, and how a fusion of the two might improve accuracy. We analyzed videos of users’ faces and upper bodies recorded 10s prior to self-reported mind wandering (i.e., ground truth) while they engaged in a computerized reading task. We found that a state-of-the-art machine learning model had comparable accuracy to aggregated judgments of nine untrained human observers (area under receiver operating characteristic curve [AUC] = .598 versus .589). A fusion of the two (AUC = .644) outperformed each, presumably because each focused on complementary cues. Furthermore, adding more humans beyond 3–4 observers yielded diminishing returns. We discuss implications of human–computer fusion as a means to improve accuracy in complex tasks.


Sign in / Sign up

Export Citation Format

Share Document