scholarly journals Measuring Brazilian Portuguese Product Titles Similarity using Embeddings

2021 ◽  
Author(s):  
Alan da Silva Romualdo ◽  
Livy Real ◽  
Helena de Medeiros Caseli

Textual similarity deals with determining how similar two pieces of texts are, considering the lexical (surface forms) or semantic (meaning) closeness. In this paper we applied word embeddings for measuring e-commerce product title similarity in Brazilian Portuguese. We generated some domainspecific word embeddings (using Word2Vec, FastText and GloVe) and compared them with general-domain models (word embeddings and BERT models). We concluded that the cosine similarity calculated using the domain-specific word embeddings was a good approach to distinguish between similar and nonsimilar products, but the multilingual BERT pre-trained model proved to be the best one.

2019 ◽  
Author(s):  
José Padarian ◽  
Ignacio Fuentes

Abstract. A large amount of descriptive information is available in most disciplines of geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings lay on a multi-dimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations, namely: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. Since this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. The resulting embedding and test suite will be made available for other researchers to use an expand.


SOIL ◽  
2019 ◽  
Vol 5 (2) ◽  
pp. 177-187 ◽  
Author(s):  
José Padarian ◽  
Ignacio Fuentes

Abstract. A large amount of descriptive information is available in geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings, which encode information about a word and its linguistic relationships with other words, lay on a multidimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. As this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. We also presented an example were we successfully emulated part of a taxonomic analysis of soil profiles that was originally applied to soil numerical data, which would not be possible without the use of embeddings. The resulting embedding and test suite will be made available for other researchers to use and expand upon.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 137309-137321
Author(s):  
Luca Cagliero ◽  
Moreno La Quatra

2020 ◽  
Vol 62 ◽  
pp. e020014
Author(s):  
Monica de Freitas Frias Chaves ◽  
Cilene Rodrigues

High levels of linguistic referential failures are associated with liability to develop schizophrenia-spectrum disorders, and it has been shown that these failures can differentiate healthy subjects, high-schizotypal and schizophrenics groups. Nevertheless, few investigations have focused on whether or not schizotypal traits in nonclinical populations can also impact linguistic reference. In Brazilian Portuguese, only one previous study (acceptability judgements task) had been conducted, and its results suggest association between schizotypal traits and a more rigid preference for assignment of specific readings to definite singular DPs. Here, we present another experimental study in Brazilian Portuguese,  a comprehension task designed to examine possible effects of schizotypal personality traits on the interpretation of definite singular DPs. The findings, in line with the previous results, support the conclusion that schizotypy does affect the interpretation of definite singular DPs in Brazilian Portuguese. Together, these two experiments suggest that schizotypal personality traits impact the integration of linguistic contextual information into the semantic meaning of definite DPs. This is  consistent with the general hypothesis that schizotypy, similarly to schizophrenia, is associated with pragmatic difficulties. Yet, our results emphasize that the impact of schizotypal traits on pragmatics can be observed even in healthy (nonclinical) speakers.


Author(s):  
Farhad Bin Siddique ◽  
Dario Bertero ◽  
Pascale Fung

We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset.


Author(s):  
Artur Boronat

Abstract When model transformations are used to implement consistency relations between very large models, incrementality plays a cornerstone role in detecting and resolving inconsistencies efficiently when models are updated. Given a directed consistency relation between two models, the problem studied in this work consists in propagating model changes from a source model to a target model in order to ensure consistency while minimizing computational costs. The mechanism that enforces such consistency is called consistency maintainer and, in this context, its scalability is a required non-functional requirement. State-of-the-art model transformation engines with support for incrementality normally rely on an observer pattern for linking model changes, also known as deltas, to the application of model transformation rules, in so-called dependencies, at run time. These model changes can then be propagated along an already executed model transformation. Only a few approaches to model transformation provide domain-specific languages for representing and storing model changes in order to enable their use in asynchronous, event-based execution environments. The principal contribution of this work is the design of a forward change propagation mechanism for incremental execution of model transformations, which decouples dependency tracking from change propagation using two innovations. First, the observer pattern-based model is replaced with dependency injection, decoupling domain models from consistency maintainers. Second, a standardized representation of model changes is reused, enabling interoperability with EMF-compliant tools, both for defining model changes and for processing them asynchronously. This procedure has been implemented in a model transformation engine, whose performance has been evaluated experimentally using the VIATRA CPS benchmark. In the experiments performed, the new transformation engine shows gains in the form of several orders of magnitude in the initial phase of the incremental execution of the benchmark model transformation and change propagation is performed in real time for those model sizes that are processable by other tools and, in addition, is able to process much larger models.


Sign in / Sign up

Export Citation Format

Share Document