An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

The classification of goods involved in international trade in Brazil is based on the Mercosur Common Nomenclature (NCM). The classification of these goods represents a real challenge due to the complexity involved in assigning the correct category codes especially considering the legal and fiscal implications of misclassification. This work focuses on the training of a classifier based on Bidirectional En-coder Representations from Transformers (BERT) for the tax classification of goods with NCM codes. In particular, this article presents results from using a specific Portuguese Language tuned BERT model as well results from using a Multilingual BERT. Experimental results justify the use of these models in the classification process and also that the language specific model has a slightly better performance.

Download Full-text

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Big Data and Cognitive Computing ◽

10.3390/bdcc6010008 ◽

2022 ◽

Vol 6 (1) ◽

pp. 8

Author(s):

Roberta Rodrigues de Lima ◽

Anita M. R. Fernandes ◽

James Roberto Bombasar ◽

Bruno Alves da Silva ◽

Paul Crocker ◽

...

Keyword(s):

International Trade ◽

Great Promise ◽

Classification Problems ◽

Empirical Comparison ◽

Legal Implications ◽

Supervised Learning Algorithms ◽

Import And Export ◽

Correct Category ◽

Real Challenge

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.

Download Full-text

MAGNETOREOLOGICAL JOURNAL BEARING: EXPERIMENTAL RESULTS

Fundamental and Applied Problems of Engineering and Technology ◽

10.33979/2073-7408-2020-341-3-83-90 ◽

2020 ◽

Vol 3 ◽

pp. 83-90

Author(s):

A.S. FETISOV ◽

V.O. TYURIN

Keyword(s):

Experimental Study ◽

Journal Bearing ◽

Experimental Results ◽

Measuring System ◽

Information Measuring ◽

Experimental Stand ◽

Information Measuring System

The article presents the classification of magnetorheological devices. The classification of bearings of rotor machines is given. An experimental stand is described that includes a magnetorheological journal bearing. The information–measuring system of the experimental stand is presented. The results of experimental study is presented.

Download Full-text

Forms of Outsourcing in Modern International Trade

World Economy and International Relations ◽

10.20542/0131-2227-2011-6-65-72 ◽

2011 ◽

pp. 65-72

Author(s):

I. Kotlyarov

Keyword(s):

International Trade ◽

Distinctive Features ◽

Points Of View

The paper contains an analysis of the existing types of outsourcing. It is demonstrated that outsourcing can be analyzed from managerial and economical points of view. A classification of types of outsourcing based on their economical nature is proposed. Distinctive features of outsourcing are put in evidence. Models of interaction between companies in case of outsourcing are described.

Download Full-text

Application of the LDA Model to Semantic Annotation of Web-based English Educational Resources

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2047 ◽

2021 ◽

Author(s):

Wei Du ◽

Haiyan Zhu ◽

Teeraporn Saeheaw

Keyword(s):

Topic Modeling ◽

Basic Education ◽

Semantic Annotation ◽

Experimental Results ◽

Educational Resources ◽

Metadata Standard ◽

Web Based ◽

Teaching Content ◽

Macroscopic Classification

Based on the LDA model, this paper builds a three-layer semantic model of Web English educational resources “document-topic-keyword”, models the semantic topics of resource documents, and obtains the semantic topics and keywords of document resources as the semantic labels of resources. The experimental results show that document LDA topic modeling is beneficial to the macroscopic classification of Web English educational resources. The experimental results show that LDA topic modeling of documents is useful for macroscopic cataloging of Web English educational resources, highlighting teaching priorities, difficulties, and interrelationships, while LDA modeling of teaching topics with the same teaching content expands the metadata generation method of resource description based on the basic education metadata standard and provides more information about the inherent characteristics of resources. The semantic information can be used to mine the semantic thematic features and detailed differences inherent in the resources, and the final performance analysis verifies the parallel computing advantages of the LDA model in a big data environment.

Download Full-text

Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Modern Physics Letters B ◽

10.1142/s0217984921501517 ◽

2021 ◽

pp. 2150151

Author(s):

Dasong Sun

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Experimental Results ◽

The Other ◽

Central Feature ◽

Number Of Clusters ◽

Average Linkage ◽

Text Feature

By clustering feature words, we can not only simplify the dimension of feature subsets, but also eliminate the redundancy of the feature. However, for a feature set with very large dimensions, the traditional [Formula: see text]-medoids algorithm is difficult to accurately estimate the value of [Formula: see text]. Moreover, the clustering results of the average linkage (AL) algorithm cannot be divided again, and the AL algorithm cannot be directly used for text classification. In order to overcome the limitations of AL and [Formula: see text]-medoids, in this paper, we combine the two algorithms together so as to be mutually complementary to each other. In particular, in order to meet the purpose of text classification, we improve the AL algorithm and propose the [Formula: see text] testing statistics to obtain the approximate number of clusters. Finally, the central feature words are preserved, and the other feature words are deleted. The experimental results show that the new algorithm largely eliminates the redundancy of the feature. Compared with the traditional TF-IDF algorithms, the performance of the text classification of the new algorithm is improved.

Download Full-text

Transformers for Multi-label Classification of Medical Text: An Empirical Comparison

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77211-6_12 ◽

2021 ◽

pp. 114-123

Author(s):

Vithya Yogarajan ◽

Jacob Montiel ◽

Tony Smith ◽

Bernhard Pfahringer

Keyword(s):

Empirical Comparison ◽

Medical Text

Download Full-text

EQUILIBRIUM CONDITIONS IN BEACH WAVE INTERACTION

Coastal Engineering Proceedings ◽

10.9753/icce.v13.62 ◽

1972 ◽

Vol 1 (13) ◽

pp. 62 ◽

Cited By ~ 1

Author(s):

H. Raman

Keyword(s):

Wave Interaction ◽

Experimental Results ◽

Laboratory Studies ◽

Regular Waves ◽

Equilibrium Conditions ◽

Wave Characteristics ◽

New Criterion ◽

Constant Characteristics ◽

Stable Points

Laboratory studies were conducted in an attempt to find out a relationship between beach and wave characteristics when equilibrium conditions are reached in beach wave interaction for the simple case of regular waves acting normal to the beach. Experimental results indicate the existence of stable points on beach profiles where the coordinates of the profile do not change with time when waves of constant characteristics act on the beach. Emperical relationship between the wave and beach properties are proposed. A new criterion for classification of beach profiles is indicated.

Download Full-text

Analysis of the Cluster Prominence Feature for Detecting Calcifications in Mammograms

Journal of Healthcare Engineering ◽

10.1155/2018/2849567 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Alejandra Cruz-Bernal ◽

Martha M. Flores-Barranco ◽

Dora L. Almanza-Ojeda ◽

Sergio Ledesma ◽

Mario A. Ibarra-Manzano

Keyword(s):

Digital Image ◽

Visual Inspection ◽

Experimental Results ◽

Gray Scale ◽

High Expectation ◽

Weak Classifier ◽

White Region ◽

Gray Scale Image ◽

Global Representation

In mammograms, a calcification is represented as small but brilliant white region of the digital image. Earlier detection of malignant calcifications in patients provides high expectation of surviving to this disease. Nevertheless, white regions are difficult to see by visual inspection because a mammogram is a gray-scale image of the breast. To help radiologists in detecting abnormal calcification, computer-inspection methods of mammograms have been proposed; however, it remains an open important issue. In this context, we propose a strategy for detecting calcifications in mammograms based on the analysis of the cluster prominence (cp) feature histogram. The highest frequencies of the cp histogram describe the calcifications on the mammography. Therefore, we obtain a function that models the behaviour of the cp histogram using the Vandermonde interpolation twice. The first interpolation yields a global representation, and the second models the highest frequencies of the histogram. A weak classifier is used for obtaining a final classification of the mammography, that is, with or without calcifications. Experimental results are compared with real DICOM images and their corresponding diagnosis provided by expert radiologists, showing that the cp feature is highly discriminative.

Download Full-text

Current Status of Advanced Reheat Gas Turbine AGTJ-100A: Part 3 — Experimental Results of Shop Tests

Volume 4: Heat Transfer; Electric Power ◽

10.1115/84-gt-57 ◽

1984 ◽

Cited By ~ 1

Author(s):

Kazuo Takeya ◽

Yasuo Oteki ◽

Hajime Yasui

Keyword(s):

International Trade ◽

Research And Development ◽

Gas Turbine ◽

Pilot Plant ◽

Science And Technology ◽

Experimental Results ◽

Current Status ◽

Technical Problems ◽

Industrial Science

The outline of plans for the research and development of an advanced reheat gas turbine under the Moonlight Project (Agency of Industrial Science and Technology, Ministry of International Trade and Industry) has already been announced in 1981 at Houston (81-GT-28), while technical problems related to the pilot plant (Paper No. 83-TOKYO-IGTC-117) as well as performance and characteristics (Paper No. 83-TOKYO-IGTC-40) have been announced at the 1983 Tokyo International Gas Turbine Congress. No-load shop tests conducted on the pilot reheat gas turbine during the period of May to July, 1983, were consummated with highly satisfactory results, so this paper is dedicated primarily to giving a description of the shop tests.

Download Full-text

Ooredoo Rayek

International Journal of Technology Diffusion ◽

10.4018/ijtd.2020040105 ◽

2020 ◽

Vol 11 (2) ◽

pp. 66-81

Author(s):

Badia Klouche ◽

Sidi Mohamed Benslimane ◽

Sakina Rim Bennabi

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Text Mining ◽

Sentiment Analysis ◽

Experimental Results ◽

Support Vector ◽

Textual Data ◽

New Strategy ◽

Set Up

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.

Download Full-text