scholarly journals Intention based Clustering of Relevant Reviews using Content Similarity

2021 ◽  
Vol 23 (11) ◽  
pp. 612-618
Author(s):  
K. Pon Karthika ◽  
◽  
Dr. S. Kavi Priya ◽  

The proposed work deals with finding related reviews posted on various online Forums. Conventional methods for matching related documents compute the content similarity over the entire review instead of partitioning into segments revealing different intentions. In this work, intention-based similarity clustering is introduced to find the relatedness of two documents. This method forms the document clusters based on the similarity of the segments with similar intentions. The segmentation points are identified using a number of text features which can express when the segmentation should be done. Finally, the document clusters are formed by grouping the segments with similar intentions in same cluster and then the similarities among the segments with the same intention are computed. The proposed model is trained on TripAdvisor and Yelp Open Review datasets to evaluate the performance of the model, and the evaluation results show that the model produces more precise results in mining documents related to the user’s interest.

Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1307
Author(s):  
Haoriqin Wang ◽  
Huaji Zhu ◽  
Huarui Wu ◽  
Xiaomin Wang ◽  
Xiao Han ◽  
...  

In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Document Frequency) method to process and analyze the text data and compare it with the Word2vec, GloVe, and TF-IDF methods. Combined with the agricultural word segmentation dictionary, we applied Word2vec with the TF-IDF method, effectively solving the problem of high dimension and sparse data in the rice-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results show that rice-related question similarity matching based on Coattention-DenseGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the rice-related question dataset. The precision and F1 values of the proposed model were 96.3% and 96.9%, respectively. Compared with seven other kinds of question similarity matching models, we present a new state-of-the-art method with our rice-related question dataset.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Xiangwen Liao ◽  
Lingying Zhang ◽  
Jingjing Wei ◽  
Dingda Yang ◽  
Guolong Chen

User influence is a very important factor for microblog user recommendation in mobile social network. However, most existing user influence analysis works ignore user’s temporal features and fail to filter the marketing users with low influence, which limits the performance of recommendation methods. In this paper, a Tensor Factorization based User Cluster (TFUC) model is proposed. We firstly identify latent influential users by neural network clustering. Then, we construct a features tensor according to latent influential user’s opinion, activity, and network centrality information. Furthermore, user influences are predicted by the latent factors resulting from the temporal restrained CP decomposition. Finally, we recommend microblog users considering both user influence and content similarity. Our experimental results show that the proposed model significantly improves recommendation performance. Meanwhile, the mean average precision of TFUC outperforms the baselines with 3.4% at least.


2021 ◽  
Author(s):  
Samuel Yuguru

Abstract Physics in general is successfully governed by quantum mechanics at the microscale and principles of relativity at the macroscale. Any attempts to unify them using conventional methods have somewhat remained elusive for nearly a century up to the present stage. Here in this study, a classical gedanken experiment of electron-wave diffraction of a single slit is intuitively examined for its quantized states. A unidirectional monopole field as quanta of the electric field is pictorially conceptualized into 4D space-time. Its application towards quantum mechanics and general relativity in accordance with existing knowledge in physics paves an alternative path towards their reconciliation process. This assumes a multiverse at a hierarchy of scales with gravity localized to a body into space. Principles of special relativity are then sustained along inertia frames of extra dimensions within the proposed model. Such descriptions provide an approximate intuitive tool to examine physics in general from alternative perspectives using conventional methods and this warrants further investigations.


2020 ◽  
Vol 12 (12) ◽  
pp. 5074
Author(s):  
Jiyoung Woo ◽  
Jaeseok Yun

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zulie Pan ◽  
Yuanchao Chen ◽  
Yu Chen ◽  
Yi Shen ◽  
Xuanzhen Guo

A webshell is a malicious backdoor that allows remote access and control to a web server by executing arbitrary commands. The wide use of obfuscation and encryption technologies has greatly increased the difficulty of webshell detection. To this end, we propose a novel webshell detection model leveraging the grammatical features extracted from the PHP code. The key idea is to combine the executable data characteristics of the PHP code with static text features for webshell classification. To verify the proposed model, we construct a cleaned data set of webshell consisting of 2,917 samples from 17 webshell collection projects and conduct extensive experiments. We have designed three sets of controlled experiments, the results of which show that the accuracy of the three algorithms has reached more than 99.40%, the highest reached 99.66%, the recall rate has been increased by at least 1.8%, the most increased by 6.75%, and the F1 value has increased by 2.02% on average. It not only confirms the efficiency of the grammatical features in webshell detection but also shows that our system significantly outperforms several state-of-the-art rivals in terms of detection accuracy and recall rate.


2017 ◽  
Vol 34 (01) ◽  
pp. 1740007 ◽  
Author(s):  
Siqing Shan ◽  
Jihong Shi ◽  
Qi Yan

A modeling methodology for blog recommendation and forecasting based on information entropy is presented. With the increasing popularity of smartphones and the rapid development of the mobile Internet, the amount of user-generated content such as blogs is increasing daily. Valuable information, such as bloggers’ opinions, feelings, and attitudes, is often part of this content. Particularly in the context of an emergency, this information should also be used to facilitate decision making. The current blog recommendation model examines primarily users’ interests or content similarity, whereas in this paper, the value of the blog is considered. The primary contribution of this paper is the proposal of an information-entropy-based blog recommendation model for finding valuable blogs to facilitate decision-making in an emergency context. A series of indicators for evaluating a blog in an emergency context are proposed. Using the method of information entropy, a blog recommendation model is developed. The model can also be used to forecast the value of emergency blogs in the future. The model has been tested and validated using crawled data from the Sina Blog, and the results have demonstrated that the proposed model can effectively determine the value of emergency-related blogs.


2021 ◽  
Author(s):  
Samuel Yuguru

Abstract Physics in general is successfully governed by quantum mechanics at the microscale and principles of relativity at the macroscale. Any attempts to unify them using conventional methods have somewhat remained elusive for nearly a century up to the present stage. Here in this study, a classical gedanken experiment of electron-wave diffraction of a single slit is intuitively examined for its quantized states. A unidirectional monopole field as quanta of the electric field is pictorially conceptualized into 4D space-time. Its application towards quantum mechanics and general relativity in accordance with existing knowledge in physics paves an alternative path towards their reconciliation process. This assumes a multiverse at a hierarchy of scales with gravity localized to a body into space. Principles of special relativity are then sustained along inertia frames of extra dimensions within the proposed model. Such descriptions provide an approximate intuitive tool to examine physics in general from alternative perspectives using conventional methods and this warrants further investigations.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Azra Nazir ◽  
Roohie Naaz Mir ◽  
Shaima Qureshi

PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.


2020 ◽  
Vol 34 (07) ◽  
pp. 12144-12151
Author(s):  
Guan-An Wang ◽  
Tianzhu Zhang ◽  
Yang Yang ◽  
Jian Cheng ◽  
Jianlong Chang ◽  
...  

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.


Author(s):  
Chang Lu ◽  
Chandan K Reddy ◽  
Prithwish Chakraborty ◽  
Samantha Kleinberg ◽  
Yue Ning

Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep-learning-based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured features. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention manipulating strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.


Sign in / Sign up

Export Citation Format

Share Document