Measuring Brazilian Portuguese Product Titles Similarity using Embeddings

Mapping Intimacies ◽

10.5753/stil.2021.17791 ◽

2021 ◽

Author(s):

Alan da Silva Romualdo ◽

Livy Real ◽

Helena de Medeiros Caseli

Keyword(s):

Brazilian Portuguese ◽

Cosine Similarity ◽

Word Embeddings ◽

Semantic Meaning ◽

General Domain ◽

Domain Specific ◽

Domain Models

Textual similarity deals with determining how similar two pieces of texts are, considering the lexical (surface forms) or semantic (meaning) closeness. In this paper we applied word embeddings for measuring e-commerce product title similarity in Brazilian Portuguese. We generated some domainspecific word embeddings (using Word2Vec, FastText and GloVe) and compared them with general-domain models (word embeddings and BERT models). We concluded that the cosine similarity calculated using the domain-specific word embeddings was a good approach to distinguish between similar and nonsimilar products, but the multilingual BERT pre-trained model proved to be the best one.

Download Full-text

Word embeddings for application in geosciences: development, evaluation and examples of soil-related concepts

10.5194/soil-2018-44 ◽

2019 ◽

Author(s):

José Padarian ◽

Ignacio Fuentes

Keyword(s):

Language Processing ◽

Dimensional Space ◽

Language Model ◽

Test Suite ◽

Word Embeddings ◽

General Domain ◽

Domain Specific ◽

Descriptive Information ◽

Development Evaluation ◽

Numerical Representations

Abstract. A large amount of descriptive information is available in most disciplines of geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings lay on a multi-dimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations, namely: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. Since this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. The resulting embedding and test suite will be made available for other researchers to use an expand.

Download Full-text

Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts

SOIL ◽

10.5194/soil-5-177-2019 ◽

2019 ◽

Vol 5 (2) ◽

pp. 177-187 ◽

Cited By ~ 2

Author(s):

José Padarian ◽

Ignacio Fuentes

Keyword(s):

Language Processing ◽

Language Model ◽

Numerical Data ◽

Test Suite ◽

Multidimensional Space ◽

Word Embeddings ◽

General Domain ◽

Domain Specific ◽

Descriptive Information ◽

Development Evaluation

Abstract. A large amount of descriptive information is available in geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings, which encode information about a word and its linguistic relationships with other words, lay on a multidimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. As this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. We also presented an example were we successfully emulated part of a taxonomic analysis of soil profiles that was originally applied to soil numerical data, which would not be possible without the use of embeddings. The resulting embedding and test suite will be made available for other researchers to use and expand upon.

Download Full-text

Development and Evaluation of Novel Ophthalmology Domain-Specific Neural Word Embeddings to Predict Visual Prognosis

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2021.104464 ◽

2021 ◽

pp. 104464

Author(s):

Sophia Wang ◽

Benjamin Tseng ◽

Tina Hernandez-Boussard

Keyword(s):

Word Embeddings ◽

Visual Prognosis ◽

Domain Specific

Download Full-text

Learning Domain-Specific Word Embeddings from COVID-19 Tweets

10.1109/bigdata52589.2021.9671817 ◽

2021 ◽

Author(s):

Steve Aibuedefe Aigbe ◽

Christoph Eick

Keyword(s):

Word Embeddings ◽

Domain Specific

Download Full-text

Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora

IEEE Access ◽

10.1109/access.2021.3118093 ◽

2021 ◽

Vol 9 ◽

pp. 137309-137321

Author(s):

Luca Cagliero ◽

Moreno La Quatra

Keyword(s):

Word Embeddings ◽

Domain Specific

Download Full-text

The impact of schizotypy on pragmatics

Cadernos de Estudos Lingüísticos ◽

10.20396/cel.v62i0.8658759 ◽

2020 ◽

Vol 62 ◽

pp. e020014

Author(s):

Monica de Freitas Frias Chaves ◽

Cilene Rodrigues

Keyword(s):

Personality Traits ◽

Contextual Information ◽

Brazilian Portuguese ◽

Schizophrenia Spectrum Disorders ◽

Semantic Meaning ◽

General Hypothesis ◽

Schizotypal Personality ◽

Schizophrenia Spectrum ◽

Spectrum Disorders ◽

The Impact

High levels of linguistic referential failures are associated with liability to develop schizophrenia-spectrum disorders, and it has been shown that these failures can differentiate healthy subjects, high-schizotypal and schizophrenics groups. Nevertheless, few investigations have focused on whether or not schizotypal traits in nonclinical populations can also impact linguistic reference. In Brazilian Portuguese, only one previous study (acceptability judgements task) had been conducted, and its results suggest association between schizotypal traits and a more rigid preference for assignment of specific readings to definite singular DPs. Here, we present another experimental study in Brazilian Portuguese, a comprehension task designed to examine possible effects of schizotypal personality traits on the interpretation of definite singular DPs. The findings, in line with the previous results, support the conclusion that schizotypy does affect the interpretation of definite singular DPs in Brazilian Portuguese. Together, these two experiments suggest that schizotypal personality traits impact the integration of linguistic contextual information into the semantic meaning of definite DPs. This is consistent with the general hypothesis that schizotypy, similarly to schizophrenia, is associated with pragmatic difficulties. Yet, our results emphasize that the impact of schizotypal traits on pragmatics can be observed even in healthy (nonclinical) speakers.

Download Full-text

GlobalTrait: Personality Alignment of Multilingual Word Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017015 ◽

2019 ◽

Vol 33 ◽

pp. 7015-7022

Author(s):

Farhad Bin Siddique ◽

Dario Bertero ◽

Pascale Fung

Keyword(s):

Personality Traits ◽

Big Five ◽

Target Language ◽

Big Five Personality ◽

Word Embeddings ◽

Alignment Method ◽

Semantic Meaning ◽

Text Data ◽

Source Language ◽

High Resource

We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset.

Download Full-text

A General Domain Specific Feature Transfer Framework for Hybrid Domain Adaptation

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2018.2864732 ◽

2019 ◽

Vol 31 (8) ◽

pp. 1440-1451 ◽

Cited By ~ 1

Author(s):

Pengfei Wei ◽

Yiping Ke ◽

Chi Keong Goh

Keyword(s):

Domain Adaptation ◽

General Domain ◽

Domain Specific ◽

Hybrid Domain

Download Full-text

The effect of word embeddings and domain specific long-range contextual information on a Recurrent Neural Network Language Model

2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) ◽

10.1109/robomech.2019.8704827 ◽

2019 ◽

Author(s):

Linda Khumalo ◽

Georg I. Schltinz ◽

Quentin Williams

Keyword(s):

Neural Network ◽

Long Range ◽

Recurrent Neural Network ◽

Contextual Information ◽

Language Model ◽

Word Embeddings ◽

Domain Specific ◽

Network Language

Download Full-text

Incremental execution of rule-based model transformation

International Journal on Software Tools for Technology Transfer ◽

10.1007/s10009-020-00583-y ◽

2020 ◽

Cited By ~ 1

Author(s):

Artur Boronat

Keyword(s):

Model Transformation ◽

Source Model ◽

Model Transformations ◽

Propagation Mechanism ◽

Change Propagation ◽

Computational Costs ◽

Domain Specific ◽

Transformation Rules ◽

Domain Models ◽

Event Based

Abstract When model transformations are used to implement consistency relations between very large models, incrementality plays a cornerstone role in detecting and resolving inconsistencies efficiently when models are updated. Given a directed consistency relation between two models, the problem studied in this work consists in propagating model changes from a source model to a target model in order to ensure consistency while minimizing computational costs. The mechanism that enforces such consistency is called consistency maintainer and, in this context, its scalability is a required non-functional requirement. State-of-the-art model transformation engines with support for incrementality normally rely on an observer pattern for linking model changes, also known as deltas, to the application of model transformation rules, in so-called dependencies, at run time. These model changes can then be propagated along an already executed model transformation. Only a few approaches to model transformation provide domain-specific languages for representing and storing model changes in order to enable their use in asynchronous, event-based execution environments. The principal contribution of this work is the design of a forward change propagation mechanism for incremental execution of model transformations, which decouples dependency tracking from change propagation using two innovations. First, the observer pattern-based model is replaced with dependency injection, decoupling domain models from consistency maintainers. Second, a standardized representation of model changes is reused, enabling interoperability with EMF-compliant tools, both for defining model changes and for processing them asynchronously. This procedure has been implemented in a model transformation engine, whose performance has been evaluated experimentally using the VIATRA CPS benchmark. In the experiments performed, the new transformation engine shows gains in the form of several orders of magnitude in the initial phase of the incremental execution of the benchmark model transformation and change propagation is performed in real time for those model sizes that are processable by other tools and, in addition, is able to process much larger models.

Download Full-text