scholarly journals Optimizing Small BERTs Trained for German NER

Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 443
Author(s):  
Jochen Zöllner ◽  
Konrad Sperfeld ◽  
Christoph Wick ◽  
Roger Labahn

Currently, the most widespread neural network architecture for training language models is the so-called BERT, which led to improvements in various NLP tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the size of these models. In this article, we investigate various training techniques of smaller BERT models: We combine different methods from other BERT variants, such as ALBERT, RoBERTa, and relative positional encoding. In addition, we propose two new fine-tuning modifications leading to better performance: CSE tagging and a modified form of LCRF. Furthermore, we introduce WWA, which reduces BERT memory usage and leads to a small increase in performance compared to classical Multi-Head-Attention. We evaluate these techniques on five public German NER tasks, of which two are introduced by this article.

2019 ◽  
Vol 8 (2S11) ◽  
pp. 2593-2599

Economic growth as measured by GDP growth rates and economic growth set as an increase in GDP strongly helps government predictions about the economic situation and the formation of economic development strategies. This measurement is done by combining mathematical and computer technology to make qualitative and quantitative predictions scientifically and appropriately for economic growth trends. It is a good practical sense to use scientific and proven methods to predict future GDP development trends of a particular economy. In some cases, machine learning methods have proven to be better forecasting results than statistical methods. A Deep Neural Network (DNN) is one type of ANN (Artificial Neural network) architecture based on deep MLP (Multi Layer Perceptron), which uses Deep Learning training techniques. This study proposes the use of DNN to predict the percentage of GDP distribution at current prices by industry sector. In this case, the DNN used will have multiple outputs as many industry sectors. The aim of this study is how to predict for the next period with the smallest possible prediction errors by using DNN.


Author(s):  
Б. В. Крыжановский ◽  
Н. Н. Смирнов ◽  
В. Ф. Никитин ◽  
Я. М. Карандашев ◽  
М. Ю. Мальсагов ◽  
...  

Моделирование горения является ключевым аспектом полномасштабного трехмерного моделирования современных и перспективных двигателей для авиационно-космических силовых установок. В данной работе изучается возможность решения задач химической кинетики с использованием искусственных нейронных сетей. С помощью классических численных методов были построены наборы обучающих данных. Выбирая среди различных архитектур многослойных нейронных сетей и настраивая их параметры, мы разработали достаточно простую модель, способную решить эту проблему. Полученная нейронная сеть работает в рекурсивном режиме и может предсказывать поведение химической многовидовой динамической системы за много шагов. Combustion process simulations are the key aspect enabling full-scale 3D simulations of advanced aerospace engines. This work studies solving chemical kinetics problems with artificial neural networks. The training datasets were generated by classical numerical methods. Choosing a multi-layer neural network architecture and fine-tuning its parameters, we developed a simple model that can solve the problem. The neural network obtained works is recursive, and by running many iterations it can predict the behavior of a chemical multimodal dynamic system.  


2020 ◽  
Vol 2020 (10) ◽  
pp. 54-62
Author(s):  
Oleksii VASYLIEV ◽  

The problem of applying neural networks to calculate ratings used in banking in the decision-making process on granting or not granting loans to borrowers is considered. The task is to determine the rating function of the borrower based on a set of statistical data on the effectiveness of loans provided by the bank. When constructing a regression model to calculate the rating function, it is necessary to know its general form. If so, the task is to calculate the parameters that are included in the expression for the rating function. In contrast to this approach, in the case of using neural networks, there is no need to specify the general form for the rating function. Instead, certain neural network architecture is chosen and parameters are calculated for it on the basis of statistical data. Importantly, the same neural network architecture can be used to process different sets of statistical data. The disadvantages of using neural networks include the need to calculate a large number of parameters. There is also no universal algorithm that would determine the optimal neural network architecture. As an example of the use of neural networks to determine the borrower's rating, a model system is considered, in which the borrower's rating is determined by a known non-analytical rating function. A neural network with two inner layers, which contain, respectively, three and two neurons and have a sigmoid activation function, is used for modeling. It is shown that the use of the neural network allows restoring the borrower's rating function with quite acceptable accuracy.


Sign in / Sign up

Export Citation Format

Share Document