multilingual text
Recently Published Documents


TOTAL DOCUMENTS

160
(FIVE YEARS 43)

H-INDEX

12
(FIVE YEARS 2)

2022 ◽  
pp. 987-1003
Author(s):  
H. T. Basavaraju ◽  
V.N. Manjunath Aradhya ◽  
D. S. Guru ◽  
H. B. S. Harish

Text in an image or a video affords more precise meaning and text is a prominent source with a clear explanation of the content than any other high-level or low-level features. The text detection process is a still challenging research work in the field of computer vision. However, complex background and orientation of the text leads to extremely stimulating text detection tasks. Multilingual text consists of different geometrical shapes than a single language. In this article, a simple and yet effective approach is presented to detect the text from an arbitrary oriented multilingual image and video. The proposed method employs the Laplacian of Gaussian to identify the potential text information. The double line structure analysis is applied to extract the true text candidates. The proposed method is evaluated on five datasets: Hua's, arbitrarily oriented, multi-script robust reading competition (MRRC), MSRA and video datasets with performance measures precision, recall and f-measure. The proposed method is also tested on real-time video, and the result is promising and encouraging.


Author(s):  
Yakobus Wiciaputra ◽  
Julio Young ◽  
Andre Rusli

With the large amount of text information circulating on the internet, there is a need of a solution that can help processing data in the form of text for various purposes. In Indonesia, text information circulating on the internet generally uses 2 languages, English and Indonesian. This research focuses in building a model that is able to classify text in more than one language, or also commonly known as multilingual text classification. The multilingual text classification will use the XLM-RoBERTa model in its implementation. This study applied the transfer learning concept used by XLM-RoBERTa to build a classification model for texts in Indonesian using only the English News Dataset as a training dataset with Matthew Correlation Coefficient value of 42.2%. The results of this study also have the highest accuracy value when tested on a large English News Dataset (37,886) with Matthew Correlation Coefficient value of 90.8%, accuracy of 93.3%, precision of 93.4%, recall of 93.3%, and F1 of 93.3% and the accuracy value when tested on a large Indonesian News Dataset (70,304) with Matthew Correlation Coefficient value of 86.4%, accuracy, precision, recall, and F1 values of 90.2% using the large size Mixed News Dataset (108,190) in the model training process. Keywords: Multilingual Text Classification, Natural Language Processing, News Dataset, Transfer Learning, XLM-RoBERTa


2021 ◽  
Vol 11 (21) ◽  
pp. 10267
Author(s):  
Puri Phakmongkol ◽  
Peerapon Vateekul

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.


2021 ◽  
Vol 5 (2) ◽  
pp. 131-140
Author(s):  
Isaak Papadopoulos

Translanguaging has been placed at the center of the research and teaching activity over the last decade, while teachers seem to promote the use of all the linguistic resources of their students in classrooms with linguistic and cultural diversity. Among the best practices for promoting translanguaging and the flexible use of the students’ resources, reading multilingual texts is proposed as an important activity for students who are daily bombarded with a great variety of diverse stimuli. To clarify it more, students tend to come into contact with “texts” in every mode, that are not only offered in their L1 but they usually include and are written in various linguistic codes known or not to them. However, limited research has focused on such issues of reading a text with multilingual wealth, thus this was a major factor and reason behind this research initiative. This paper presents a study that aimed at investigating young learners’ reading strategies when approaching a multilingual text. More specifically, 27 primary education students of Greek origin were provided with two different types of multilingual texts and they were asked to complete a specifically designed record protocol reflecting on their reading behavior. At first, the students came across a text, which was given both in another language and in Greek while at a second phase, the students were encouraged to read a text in which different languages were used. Within this context, an attempt was made to identify the strategies of students prior to reading, while-reading and upon reading with the purpose to shed light into the multidimensional framework of reading a multilingual text. Following the processing of the data derived from the multidimensional research, it was revealed that students employed a great variety of reading strategies before they begin to read the text. Nevertheless, they did not seem to use while-and post-reading strategies to a great extent, when a multilingual text is given to them inciting more interest in raising students’ reading strategies through implementing educational activities.


Author(s):  
Sze Pei Tan Et.al

Machine learning systems play an important role in helping and assisting engineers in their daily activities. Many jobs can now be automated, and one of them is in handling and processing customers’ complaints before they could proceed with failure investigation. In this paper, we discuss a real-life challenge faced by the manufacturing engineers in a life science multinational company. This paper presents a step by step methodology of multilingual translation and multiclassification of Repair Codes. This solution will allow manufacturing engineers to take advantage of machine learning model to reduce the time taken to manually translate row by row and verify the Repair Codes in the file.


Target ◽  
2021 ◽  
Author(s):  
Fernando Prieto Ramos ◽  
Diego Guzmán

Abstract Studies of institutional translation have traditionally focused on European Union (EU) institutions and legislative genres. In order to develop a more comprehensive characterization of translation at international organizations beyond EU supranational law, this study compares a full mapping of multilingual text production at EU institutions to that of two representative intergovernmental organizations (IGOs), the United Nations (UN) and the World Trade Organization (WTO), over three years (2005, 2010, and 2015) in three common official languages (English, French, and Spanish). The corpus-driven quantitative analysis and categorization of all texts from a legal-functional perspective corroborate the interconnection of a wide range of textual genres that perform, support, or derive from central law-making, monitoring, and adjudicative functions, or fulfill other administrative purposes. Reports of several subtypes and functions are the most common genre across the institutions. The findings also highlight interinstitutional variation that reflects the features of each legal order, in particular the prominence of hard law-making at the EU (with a high proportion of drafts and input documents) as opposed to larger translation volumes in monitoring procedures at the UN and the WTO. This mapping is considered instrumental to further analyze legal and other specialized translation practices in international institutional settings, and ultimately to inform translator training and translation quality management.


Sign in / Sign up

Export Citation Format

Share Document