scholarly journals An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Author(s):  
Roberta Rodrigues de Lima ◽  
Anita M. R. Fernandes ◽  
James Roberto Bombasar ◽  
Bruno Alves da Silva ◽  
Paul Crocker ◽  
...  

The classification of goods involved in international trade in Brazil is based on the Mercosur Common Nomenclature (NCM). The classification of these goods represents a real challenge due to the complexity involved in assigning the correct category codes especially considering the legal and fiscal implications of misclassification. This work focuses on the training of a classifier based on Bidirectional En-coder Representations from Transformers (BERT) for the tax classification of goods with NCM codes. In particular, this article presents results from using a specific Portuguese Language tuned BERT model as well results from using a Multilingual BERT. Experimental results justify the use of these models in the classification process and also that the language specific model has a slightly better performance.

2022 ◽  
Vol 6 (1) ◽  
pp. 8
Author(s):  
Roberta Rodrigues de Lima ◽  
Anita M. R. Fernandes ◽  
James Roberto Bombasar ◽  
Bruno Alves da Silva ◽  
Paul Crocker ◽  
...  

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.


Author(s):  
A.S. FETISOV ◽  
V.O. TYURIN

The article presents the classification of magnetorheological devices. The classification of bearings of rotor machines is given. An experimental stand is described that includes a magnetorheological journal bearing. The information–measuring system of the experimental stand is presented. The results of experimental study is presented.


Author(s):  
I. Kotlyarov

The paper contains an analysis of the existing types of outsourcing. It is demonstrated that outsourcing can be analyzed from managerial and economical points of view. A classification of types of outsourcing based on their economical nature is proposed. Distinctive features of outsourcing are put in evidence. Models of interaction between companies in case of outsourcing are described.


Author(s):  
Wei Du ◽  
Haiyan Zhu ◽  
Teeraporn Saeheaw

Based on the LDA model, this paper builds a three-layer semantic model of Web English educational resources “document-topic-keyword”, models the semantic topics of resource documents, and obtains the semantic topics and keywords of document resources as the semantic labels of resources. The experimental results show that document LDA topic modeling is beneficial to the macroscopic classification of Web English educational resources. The experimental results show that LDA topic modeling of documents is useful for macroscopic cataloging of Web English educational resources, highlighting teaching priorities, difficulties, and interrelationships, while LDA modeling of teaching topics with the same teaching content expands the metadata generation method of resource description based on the basic education metadata standard and provides more information about the inherent characteristics of resources. The semantic information can be used to mine the semantic thematic features and detailed differences inherent in the resources, and the final performance analysis verifies the parallel computing advantages of the LDA model in a big data environment.


2021 ◽  
pp. 2150151
Author(s):  
Dasong Sun

By clustering feature words, we can not only simplify the dimension of feature subsets, but also eliminate the redundancy of the feature. However, for a feature set with very large dimensions, the traditional [Formula: see text]-medoids algorithm is difficult to accurately estimate the value of [Formula: see text]. Moreover, the clustering results of the average linkage (AL) algorithm cannot be divided again, and the AL algorithm cannot be directly used for text classification. In order to overcome the limitations of AL and [Formula: see text]-medoids, in this paper, we combine the two algorithms together so as to be mutually complementary to each other. In particular, in order to meet the purpose of text classification, we improve the AL algorithm and propose the [Formula: see text] testing statistics to obtain the approximate number of clusters. Finally, the central feature words are preserved, and the other feature words are deleted. The experimental results show that the new algorithm largely eliminates the redundancy of the feature. Compared with the traditional TF-IDF algorithms, the performance of the text classification of the new algorithm is improved.


1972 ◽  
Vol 1 (13) ◽  
pp. 62 ◽  
Author(s):  
H. Raman

Laboratory studies were conducted in an attempt to find out a relationship between beach and wave characteristics when equilibrium conditions are reached in beach wave interaction for the simple case of regular waves acting normal to the beach. Experimental results indicate the existence of stable points on beach profiles where the coordinates of the profile do not change with time when waves of constant characteristics act on the beach. Emperical relationship between the wave and beach properties are proposed. A new criterion for classification of beach profiles is indicated.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Alejandra Cruz-Bernal ◽  
Martha M. Flores-Barranco ◽  
Dora L. Almanza-Ojeda ◽  
Sergio Ledesma ◽  
Mario A. Ibarra-Manzano

In mammograms, a calcification is represented as small but brilliant white region of the digital image. Earlier detection of malignant calcifications in patients provides high expectation of surviving to this disease. Nevertheless, white regions are difficult to see by visual inspection because a mammogram is a gray-scale image of the breast. To help radiologists in detecting abnormal calcification, computer-inspection methods of mammograms have been proposed; however, it remains an open important issue. In this context, we propose a strategy for detecting calcifications in mammograms based on the analysis of the cluster prominence (cp) feature histogram. The highest frequencies of the cp histogram describe the calcifications on the mammography. Therefore, we obtain a function that models the behaviour of the cp histogram using the Vandermonde interpolation twice. The first interpolation yields a global representation, and the second models the highest frequencies of the histogram. A weak classifier is used for obtaining a final classification of the mammography, that is, with or without calcifications. Experimental results are compared with real DICOM images and their corresponding diagnosis provided by expert radiologists, showing that the cp feature is highly discriminative.


Author(s):  
Kazuo Takeya ◽  
Yasuo Oteki ◽  
Hajime Yasui

The outline of plans for the research and development of an advanced reheat gas turbine under the Moonlight Project (Agency of Industrial Science and Technology, Ministry of International Trade and Industry) has already been announced in 1981 at Houston (81-GT-28), while technical problems related to the pilot plant (Paper No. 83-TOKYO-IGTC-117) as well as performance and characteristics (Paper No. 83-TOKYO-IGTC-40) have been announced at the 1983 Tokyo International Gas Turbine Congress. No-load shop tests conducted on the pilot reheat gas turbine during the period of May to July, 1983, were consummated with highly satisfactory results, so this paper is dedicated primarily to giving a description of the shop tests.


2020 ◽  
Vol 11 (2) ◽  
pp. 66-81
Author(s):  
Badia Klouche ◽  
Sidi Mohamed Benslimane ◽  
Sakina Rim Bennabi

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.


Sign in / Sign up

Export Citation Format

Share Document