METHOD OF SYSTEM ENGINEERING OF NEURAL MACHINE TRANSLATION SYSTEMS
Background. There are not many machine translation companies on the market whose products are in demand. These are, for example, free and commercial products such as “GoogleTranslate”, “DeepLTranslator”, “ModernMT”, “Apertium”, “Trident”, to name a few. To implement a more efficient and productive process for developing high-quality neural machine translation systems (NMTS), appropriate scientifically based methods of NMTS engineering are needed in order to get a high-quality and competitive product as quickly as possible. Objective. The purpose of this article is to apply the Eriksson-Penker business profile to the development and formalization of a method for system engineering of NMTS. Methods. The idea behind the neural machine translation system engineering method is to apply the Eriksson-Penker system engineering methodology and business profile to formalize an ordered way to develop NMT systems. Results. The method of developing NMT systems based on the use of system engineering techniques consists of three main stages. At the first stage, the structure of the NMT system is modelled in the form of an Eriksson-Penker business profile. At the second stage, a set of processes is determined that is specific to the class of Data Science systems, and the international CRISP-DM standard. At the third stage, verification and validation of the developed NMTS is carried out. Conclusions. The article proposes a method of system engineering of NMTS based on the modified Erickson-Penker business profile representation of the system at the meta-level, as well as international process standards of Data Science and Data Mining. The effectiveness of using this method was studied on the example of developing a bidirectional English-Ukrainian NMTS EUMT (English-Ukrainian Machine Translator) and it was found that the EUMT system is at least as good as the quality of English-Ukrainian translation of the popular Google Translate translator. The full version code of the EUMT system is published on the GitHub platform and is available at: https://github.com/EugeneSel/EUMT.