Word Embeddings for Semantic Resemblance of Substantial Text Data: A Comparative Study

Author(s):  
Kazi Lutful Kabir ◽  
Fardina Fathmiul Alam ◽  
Anika Binte Islam
2021 ◽  
pp. 196-208
Author(s):  
Jose A. Diaz-Garcia ◽  
M. Dolores Ruiz ◽  
Maria J. Martin-Bautista

Author(s):  
Farhad Bin Siddique ◽  
Dario Bertero ◽  
Pascale Fung

We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset.


2018 ◽  
Vol 7 (2.14) ◽  
pp. 5726
Author(s):  
Oumaima Hourrane ◽  
El Habib Benlahmar ◽  
Ahmed Zellou

Sentiment analysis is one of the new absorbing parts appeared in natural language processing with the emergence of community sites on the web. Taking advantage of the amount of information now available, research and industry have been seeking ways to automatically analyze the sentiments expressed in texts. The challenge for this task is the human language ambiguity, and also the lack of labeled data. In order to solve this issue, sentiment analysis and deep learning have been merged as deep learning models are effective due to their automatic learning capability. In this paper, we provide a comparative study on IMDB movie review dataset, we compare word embeddings and further deep learning models on sentiment analysis and give broad empirical outcomes for those keen on taking advantage of deep learning for sentiment analysis in real-world settings.


Author(s):  
Panos Panagiotou ◽  
George Kalpakis ◽  
Theodora Tsikrika ◽  
Stefanos Vrochidis ◽  
Ioannis Kompatsiaris

Author(s):  
Ravindra Babu Tallamaraju ◽  
Manas Kirti

With reducing cost of storage devices, increasing amounts of data is being stored and processed for extracting intelligence. Classification and clustering have been two major approaches in generating data abstraction. Over the last few years, text data is dominating the types of data shared and stored. Some of the sources of such datasets are mobile data, e-commerce, and wide-range of continuously expanding social-networking services. Within each of these sources, the nature of data differs drastically from formal language text to Twitter or SMS slangs thereby leading to the need for different ways of processing the data for making meaningful summarization. Such summaries could effectively be used for business advantage. Processing of such data requires identifying appropriate set of features both for efficiency and effectiveness. In the current Chapter, we propose to discuss approaches to text feature selection and make a comparative study.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.


Author(s):  
Nikolaos Bastas ◽  
George Kalpakis ◽  
Theodora Tsikrika ◽  
Stefanos Vrochidis ◽  
Ioannis Kompatsiaris

2017 ◽  
Vol 64 ◽  
pp. 432-439 ◽  
Author(s):  
Liang Yao ◽  
Yin Zhang ◽  
Qinfei Chen ◽  
Hongze Qian ◽  
Baogang Wei ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document