short text
Recently Published Documents


TOTAL DOCUMENTS

1138
(FIVE YEARS 494)

H-INDEX

31
(FIVE YEARS 9)

2022 ◽  
Vol 18 (2) ◽  
pp. 1-27
Author(s):  
Hang Cui ◽  
Tarek Abdelzaher

This article narrows the gap between physical sensing systems that measure physical signals and social sensing systems that measure information signals by (i) defining a novel algorithm for extracting information signals (building on results from text embedding) and (ii) showing that it increases the accuracy of truth discovery—the separation of true information from false/manipulated one. The work is applied in the context of separating true and false facts on social media, such as Twitter and Reddit, where users post predominantly short microblogs. The new algorithm decides how to aggregate the signal across words in the microblog for purposes of clustering the miscroblogs in the latent information signal space, where it is easier to separate true and false posts. Although previous literature extensively studied the problem of short text embedding/representation, this article improves previous work in three important respects: (1) Our work constitutes unsupervised truth discovery, requiring no labeled input or prior training. (2) We propose a new distance metric for efficient short text similarity estimation, we call Semantic Subset Matching , that improves our ability to meaningfully cluster microblog posts in the latent information signal space. (3) We introduce an iterative framework that jointly improves miscroblog clustering and truth discovery. The evaluation shows that the approach improves the accuracy of truth-discovery by 6.3%, 2.5%, and 3.8% (constituting a 38.9%, 14.2%, and 18.7% reduction in error, respectively) in three real Twitter data traces.


Author(s):  
Li-Ming Chen ◽  
Bao-Xin Xiu ◽  
Zhao-Yun Ding

AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.


Author(s):  
Junxian Wu ◽  
Xiaojun Chen ◽  
Shaotian Cai ◽  
Yongqi Li ◽  
Huzi Wu
Keyword(s):  

2021 ◽  
Vol 17 (1) ◽  
pp. 1-19
Author(s):  
Zhihua Zhao ◽  
Zhihao Hao ◽  
Guancheng Wang ◽  
Dianhui Mao ◽  
Bob Zhang ◽  
...  

E-commerce has developed greatly in recent years, as such, its regulations have become one of the most important research areas in order to implement a sustainable market. The analysis of a large amount of reviews data generated in the shopping process can be used to facilitate regulation: since the review data is short text and it is easy to extract the features through deep learning methods. Through these features, the sentiment analysis of the review data can be carried out to obtain the users’ emotional tendency for a specific product. Regulators can formulate reasonable regulation strategies based on the analysis results. However, the data has many issues such as poor reliability and easy tampering at present, which greatly affects the outcome and can lead regulators to make some unreasonable regulatory decisions according to these results. Blockchain provides the possibility of solving these problems due to its trustfulness, transparency and unmodifiable features. Based on these, the blockchain can be applied for data storage, and the Long short-term memory (LSTM) network can be employed to mine reviews data for emotional tendencies analysis. In order to improve the accuracy of the results, we designed a method to make LSTM better understand text data such as reviews containing idioms. In order to prove the effectiveness of the proposed method, different experiments were used for verification, with all results showing that the proposed method can achieve a good outcome in the sentiment analysis leading to regulators making better decisions.


2021 ◽  
Vol 10 (1-2) ◽  
pp. 152-173
Author(s):  
Ogunnaike Oludamini

Abstract This article presents an annotated translation of The Exposition of Devotions, a short text by Shaykh ʿAbd al-Qādir ibn Muṣtafā (1218–1280/1804–1864) about his spiritual master and maternal uncle, Muḥammad Sambo (1195–1242/1782–1826). Muḥammad Sambo was the son of ʿUthmān ibn Fūdī (also known as Usman dan Fodio), the founder of the Sokoto Caliphate, one of the largest pre-colonial polities on the African continent. While modern scholarship has tended to focus on the political, legal, social, and economic dimensions of the jihad movement that created the Sokoto Caliphate, this text provides a brief, but detailed account of the spiritual practices and discussions amongst Usman dan Fodio’s clan (the Fodiawa), demonstrating the centrality of the Akbarī tradition in technical discussions, as well as the unique developments of this tradition in thirteenth/nineteenth century West Africa. The work begins with an account of a dream of the then-deceased Muḥammad Sambo that occasioned its composition, and after a brief discussion of the status of dreams and their importance, gives an account of Sambo’s spiritual method and practices. The short treatise concludes with the author’s summary of Sambo’s responses to several technical and highly esoteric questions posed to him by the author, illustrating the profound mastery and unique perspectives developed on these topics by the Fodiawa. Combining oneirology, hagiography, practical and theoretical Sufism, this short treatise is an illuminating window into the spiritual and intellectual traditions of the founders of the Sokoto Caliphate.


2021 ◽  
Vol 2138 (1) ◽  
pp. 012024
Author(s):  
Tuo Shi ◽  
Na Wang ◽  
Lei Zhang

Abstract Traffic accident data of traffic management department is recorded in unstructured text form, which contains a large number of characteristic descriptions related to risky driving behavior. However, such data has short text length and abundant professional vocabulary. Many text mining techniques cannot effectively analyze such text data. This paper proposes an improved LDA algorithm based on CBOW—LDA-CBOW model for the study of traffic accident text data containing illegal behaviors. This model can better extract the topics of traffic accident data and filter the keywords under the corresponding topics, which provides a better way to study the dependence relationship between traffic data and illegal behaviors. Experiments show that compared to other models, this model can better extract related topics of traffic accident data with higher model efficiency and better robustness.


Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 352
Author(s):  
Ke Zhao ◽  
Lan Huang ◽  
Rui Song ◽  
Qiang Shen ◽  
Hao Xu

Short text classification is an important problem of natural language processing (NLP), and graph neural networks (GNNs) have been successfully used to solve different NLP problems. However, few studies employ GNN for short text classification, and most of the existing graph-based models ignore sequential information (e.g., word orders) in each document. In this work, we propose an improved sequence-based feature propagation scheme, which fully uses word representation and document-level word interaction and overcomes the limitations of textual features in short texts. On this basis, we utilize this propagation scheme to construct a lightweight model, sequential GNN (SGNN), and its extended model, ESGNN. Specifically, we build individual graphs for each document in the short text corpus based on word co-occurrence and use a bidirectional long short-term memory network (Bi-LSTM) to extract the sequential features of each document; therefore, word nodes in the document graph retain contextual information. Furthermore, two different simplified graph convolutional networks (GCNs) are used to learn word representations based on their local structures. Finally, word nodes combined with sequential information and local information are incorporated as the document representation. Extensive experiments on seven benchmark datasets demonstrate the effectiveness of our method.


Author(s):  
Lei Liu ◽  
Hao Chen ◽  
Yinghong Sun

Sentiment analysis of social media texts has become a research hotspot in information processing. Sentiment analysis methods based on the combination of machine learning and sentiment lexicon need to select features. Selected emotional features are often subjective, which can easily lead to overfitted models and poor generalization ability. Sentiment analysis models based on deep learning can automatically extract effective text emotional features, which will greatly improve the accuracy of text sentiment analysis. However, due to the lack of a multi-classification emotional corpus, it cannot accurately express the emotional polarity. Therefore, we propose a multi-classification sentiment analysis model, GLU-RCNN, based on Gated Linear Units and attention mechanism. Our model uses the Gated Linear Units based attention mechanism to integrate the local features extracted by CNN with the semantic features extracted by the LSTM. The local features of short text are extracted and concatenated by using multi-size convolution kernels. At the classification layer, the emotional features extracted by CNN and LSTM are respectively concatenated to express the emotional features of the text. The detailed evaluation on two benchmark datasets shows that the proposed model outperforms state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document