Scope and Challenges of Language Modelling - An Interrogative Survey on Context and Embeddings

Author(s):  
Matthias Nitsche ◽  
Marina Tropmann-Frick
Keyword(s):  
Author(s):  
Saad Irtza ◽  
Vidhyasaharan Sethu ◽  
Sarith Fernando ◽  
Eliathamby Ambikairajah ◽  
Haizhou Li

PLoS ONE ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. e0229963 ◽  
Author(s):  
Ignat Drozdov ◽  
Daniel Forbes ◽  
Benjamin Szubert ◽  
Mark Hall ◽  
Chris Carlin ◽  
...  
Keyword(s):  
X Ray ◽  

1998 ◽  
Vol 5 (3) ◽  
pp. 246-255 ◽  
Author(s):  
Royal Skousen

Author(s):  
Sarah Samson Juan ◽  
Muhamad Fikri Che Ismail ◽  
Hamimah Ujir ◽  
Irwandi Hipiny

2013 ◽  
Author(s):  
X. Liu ◽  
M. J. F. Gales ◽  
P. C. Woodland

Author(s):  
Ye Lin ◽  
Yanyang Li ◽  
Tengbo Liu ◽  
Tong Xiao ◽  
Tongran Liu ◽  
...  

8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization. In this work, we show that after a principled modification on the Transformer architecture, dubbed Integer Transformer, an (almost) fully 8-bit integer inference algorithm Scale Propagation could be derived. De-quantization is adopted when necessary, which makes the network more efficient. Our experiments on WMT16 En<->Ro, WMT14 En<->De and En->Fr translation tasks as well as the WikiText-103 language modelling task show that the fully 8-bit Transformer system achieves comparable performance with the floating point baseline but requires nearly 4x less memory footprint.


Sign in / Sign up

Export Citation Format

Share Document