scholarly journals TEXT CLUSTERING FOR REDUCING SEMANTIC INFORMATION IN MALAY SEMANTIC REPRESENTATION

Author(s):  
Tuan Norhafizah Tuan Zakaria ◽  
Mohd Juzaiddin Ab Aziz ◽  
Mohd Rosmadi Mokhtar ◽  
Saadiyah Darus
2014 ◽  
Vol 971-973 ◽  
pp. 1747-1751 ◽  
Author(s):  
Lei Zhang ◽  
Hai Qiang Chen ◽  
Wei Jie Li ◽  
Yan Zhao Liu ◽  
Run Pu Wu

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-9 ◽  
Author(s):  
Xiaochao Fan ◽  
Hongfei Lin ◽  
Liang Yang ◽  
Yufeng Diao ◽  
Chen Shen ◽  
...  

Humor refers to the quality of being amusing. With the development of artificial intelligence, humor recognition is attracting a lot of research attention. Although phonetics and ambiguity have been introduced by previous studies, existing recognition methods still lack suitable feature design for neural networks. In this paper, we illustrate that phonetics structure and ambiguity associated with confusing words need to be learned for their own representations via the neural network. Then, we propose the Phonetics and Ambiguity Comprehension Gated Attention network (PACGA) to learn phonetic structures and semantic representation for humor recognition. The PACGA model can well represent phonetic information and semantic information with ambiguous words, which is of great benefit to humor recognition. Experimental results on two public datasets demonstrate the effectiveness of our model.


2019 ◽  
Vol 8 (8) ◽  
pp. 347 ◽  
Author(s):  
Stelios Vitalis ◽  
Ken Ohori ◽  
Jantien Stoter

3D city models are being extensively used in applications such as evacuation scenarios and energy consumption estimation. The main standard for 3D city models is the CityGML data model which can be encoded through the CityJSON data format. CityGML and CityJSON use polygonal modelling in order to represent geometries. True topological data structures have proven to be more computationally efficient for geometric analysis compared to polygonal modelling. In a previous study, we have introduced a method to topologically reconstruct CityGML models while maintaining the semantic information of the dataset, based solely on the combinatorial map (C-Map) data structure. As a result of the limitations of C-Map’s semantic representation mechanism, the resulting datasets could suffer either from semantic information loss or the redundant repetition of them. In this article, we propose a solution for a more efficient representation of geometry, topology and semantics by incorporating the C-Map data structure into the CityGML data model and implementing a CityJSON extension to encode the C-Map data. In addition, we provide an algorithm for the topological reconstruction of CityJSON datasets to append them according to this extension. Finally, we apply our methodology to three open datasets in order to validate our approach when applied to real-world data. Our results show that the proposed CityJSON extension can represent all geometric information of a city model in a lossless way, providing additional topological information for the objects of the model.


2010 ◽  
Vol 129-131 ◽  
pp. 50-54
Author(s):  
Wei Ping Shao ◽  
Chun Yan Wang ◽  
Yong Ping Hao ◽  
Peng Fei Zeng ◽  
Xiao Lei Xu

An ontology-based workflow (workflow-ontology) representation method was proposed after analyzing that not only structure information but also semantic information were needed in a workflow model. Workflow-ontology concepts were composed by class and subclass of the workflow. Concepts’ properties including their values and characteristics were redefined, and then, workflow-ontology modeling method was put forward based on the ontology expresses and definitions above. With the example of applying in products examined and approved workflows, the corresponding workflow-ontology model (WFO) was built.


2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Yinyu Lan ◽  
Shizhu He ◽  
Kang Liu ◽  
Xiangrong Zeng ◽  
Shengping Liu ◽  
...  

Abstract Background Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the existed knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness. Methods To address the above issues, this paper proposes two novel path-based reasoning methods to solve the sparsity issues of entity and path respectively, which adopts the textual semantic information of entities and paths for MedKGC. By using the pre-trained model BERT, combining the textual semantic representations of the entities and the relationships, we model the task of symbolic reasoning in the medical KG as a numerical computing issue in textual semantic representation. Results Experiments results on the publicly authoritative Chinese symptom knowledge graph demonstrated that the proposed method is significantly better than the state-of-the-art path-based knowledge graph reasoning methods, and the average performance is improved by 5.83% for all relations. Conclusions In this paper, we propose two new knowledge graph reasoning algorithms, which adopt textual semantic information of entities and paths and can effectively alleviate the sparsity problem of entities and paths in the MedKGC. As far as we know, it is the first method to use pre-trained language models and text path representations for medical knowledge reasoning. Our method can complete the impaired symptom knowledge graph in an interpretable way, and it outperforms the state-of-the-art path-based reasoning methods.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257092
Author(s):  
Jianyi Liu ◽  
Xi Duan ◽  
Ru Zhang ◽  
Youqiang Sun ◽  
Lei Guan ◽  
...  

Recent relation extraction models’ architecture are evolved from the shallow neural networks to natural language model, such as convolutional neural networks or recurrent neural networks to Bert. However, these methods did not consider the semantic information in the sequence or the distance dependence problem, the internal semantic information may contain the useful knowledge which can help relation classification. Focus on these problems, this paper proposed a BERT-based relation classification method. Compare with the existing Bert-based architecture, the proposed model can obtain the internal semantic information between entity pair and solve the distance semantic dependence better. The pre-trained BERT model after fine tuning is used in this paper to abstract the semantic representation of sequence, then adopt the piecewise convolution to obtain semantic information which influence the extraction results. Compare with the existing methods, the proposed method can achieve a better accuracy on relational extraction task because of the internal semantic information extracted in the sequence. While, the generalization ability is still a problem that cannot be ignored, and the numbers of the relationships are difference between different categories. In this paper, the focal loss function is adopted to solve this problem by assigning a heavy weight to less number or hard classify categories. Finally, comparing with the existing methods, the F1 metric of the proposed method can reach a superior result 89.95% on the SemEval-2010 Task 8 dataset.


Author(s):  
Na Zheng ◽  
Jie Yu Wu

A clustering method based on the Latent Dirichlet Allocation and the VSM model to compute the text similarity is presented. The Latent Dirichlet Allocation subject models and the VSM vector space model weights strategy are used respectively to calculate the text similarity. The linear combination of the two results is used to get the text similarity. Then the k-means clustering algorithm is chosen for cluster analysis. It can not only solve the deep semantic information leakage problems of traditional text clustering, but also solve the problem of the LDA that could not distinguish the texts because of too much dimension reduction. So the deep semantic information is mined from the text, and the clustering efficiency is improved. Through the comparisons with the traditional methods, the result shows that this algorithm can improve the performance of text clustering.


Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1806
Author(s):  
Zunwang Ke ◽  
Zhe Li ◽  
Chenzhi Zhou ◽  
Jiabao Sheng ◽  
Wushour Silamu ◽  
...  

Social media had a revolutionary impact because it provides an ideal platform for share information; however, it also leads to the publication and spreading of rumors. Existing rumor detection methods have relied on finding cues from only user-generated content, user profiles, or the structures of wide propagation. However, the previous works have ignored the organic combination of wide dispersion structures in rumor detection and text semantics. To this end, we propose KZWANG, a framework for rumor detection that provides sufficient domain knowledge to classify rumors accurately, and semantic information and a propagation heterogeneous graph are symmetry fused together. We utilize an attention mechanism to learn a semantic representation of text and introduce a GCN to capture the global and local relationships among all the source microblogs, reposts, and users. An organic combination of text semantics and propagating heterogeneous graphs is then used to train a rumor detection classifier. Experiments on Sina Weibo, Twitter15, and Twitter16 rumor detection datasets demonstrate the proposed model’s superiority over baseline methods. We also conduct an ablation study to understand the relative contributions of the various aspects of the method we proposed.


2020 ◽  
Vol 17 (2) ◽  
pp. 537-552
Author(s):  
Huan Zhao ◽  
Jie Cao ◽  
Mingquan Xu ◽  
Jian Lu

In the conventional sequence-to-sequence (seq2seq) model for abstractive summarization, the internal transformation structure of recurrent neural networks (RNNs) is completely determined. Therefore, the learned semantic information is far from enough to represent all semantic details and context dependencies, resulting in a redundant summary and poor consistency. In this paper, we propose a variational neural decoder text summarization model (VND). The model introduces a series of implicit variables by combining variational RNN and variational autoencoder, which is used to capture complex semantic representation at each step of decoding. It includes a standard RNN layer and a variational RNN layer [5]. These two network layers respectively generate a deterministic hidden state and a random hidden state. We use these two RNN layers to establish the dependence between implicit variables between adjacent time steps. In this way, the model structure can better capture the complex semantics and the strong dependence between the adjacent time steps when outputting the summary, thereby improving the performance of generating the summary. The experimental results show that, on the text summary LCSTS and English Gigaword dataset, our model has a significant improvement over the baseline model.


Sign in / Sign up

Export Citation Format

Share Document