Improving the Accuracy and Diversity of Feature Extraction From Online Reviews Using Keyword Embedding and Two Clustering Methods

2021 ◽  
Author(s):  
Seyoung Park ◽  
Harrison Kim
Author(s):  
Seyoung Park ◽  
Harrison M. Kim

Abstract In product design, it is essential to understand customers’ preferences for product features. Traditional methods including the survey and interview are time-consuming and costly. As an alternative, research on utilizing online data for user analysis has been actively conducted. Although various methods have been proposed in this domain, most of them focus on the main features or usages of the product. However, from the manufacturer’s perspective, sub-features are as crucial as main features or usages, because the preference for sub-features is necessary for component configuration in actual product development. As the first step to solve this problem, this paper proposes a methodology to extract and cluster sub-features by incorporating phrase embedding into the previous word embedding. Also, the presented methodology increases the accuracy and diversity of the clustering result by using X-means clustering as a noise filter and adopting spectral clustering.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5755
Author(s):  
Pei Zhang ◽  
Siwei Wang ◽  
Jingtao Hu ◽  
Zhen Cheng ◽  
Xifeng Guo ◽  
...  

With the enormous amount of multi-source data produced by various sensors and feature extraction approaches, multi-view clustering (MVC) has attracted developing research attention and is widely exploited in data analysis. Most of the existing multi-view clustering methods hold on the assumption that all of the views are complete. However, in many real scenarios, multi-view data are often incomplete for many reasons, e.g., hardware failure or incomplete data collection. In this paper, we propose an adaptive weighted graph fusion incomplete multi-view subspace clustering (AWGF-IMSC) method to solve the incomplete multi-view clustering problem. Firstly, to eliminate the noise existing in the original space, we transform complete original data into latent representations which contribute to better graph construction for each view. Then, we incorporate feature extraction and incomplete graph fusion into a unified framework, whereas two processes can negotiate with each other, serving for graph learning tasks. A sparse regularization is imposed on the complete graph to make it more robust to the view-inconsistency. Besides, the importance of different views is automatically learned, further guiding the construction of the complete graph. An effective iterative algorithm is proposed to solve the resulting optimization problem with convergence. Compared with the existing state-of-the-art methods, the experiment results on several real-world datasets demonstrate the effectiveness and advancement of our proposed method.


2019 ◽  
Vol 9 (5) ◽  
pp. 987 ◽  
Author(s):  
Naveed Hussain ◽  
Hamid Turab Mirza ◽  
Ghulam Rasool ◽  
Ibrar Hussain ◽  
Mohammad Kaleem

Online reviews about the purchase of products or services provided have become the main source of users’ opinions. In order to gain profit or fame, usually spam reviews are written to promote or demote a few target products or services. This practice is known as review spamming. In the past few years, a variety of methods have been suggested in order to solve the issue of spam reviews. In this study, the researchers carry out a comprehensive review of existing studies on spam review detection using the Systematic Literature Review (SLR) approach. Overall, 76 existing studies are reviewed and analyzed. The researchers evaluated the studies based on how features are extracted from review datasets and different methods and techniques that are employed to solve the review spam detection problem. Moreover, this study analyzes different metrics that are used for the evaluation of the review spam detection methods. This literature review identified two major feature extraction techniques and two different approaches to review spam detection. In addition, this study has identified different performance metrics that are commonly used to evaluate the accuracy of the review spam detection models. Lastly, this work presents an overall discussion about different feature extraction approaches from review datasets, the proposed taxonomy of spam review detection approaches, evaluation measures, and publicly available review datasets. Research gaps and future directions in the domain of spam review detection are also presented. This research identified that success factors of any review spam detection method have interdependencies. The feature’s extraction depends upon the review dataset, and the accuracy of review spam detection methods is dependent upon the selection of the feature engineering approach. Therefore, for the successful implementation of the spam review detection model and to achieve better accuracy, these factors are required to be considered in accordance with each other. To the best of the researchers’ knowledge, this is the first comprehensive review of existing studies in the domain of spam review detection using SLR process.


Author(s):  
Chanida Kaewphet ◽  
Nawaporn Wisitpongpun

<span>Reviews of e-commerce play an important role in online purchasing decisions. Consumers are likely to read reviews and comments on products from other consumers. In addition to those opinions that reflect consumers' trust in products, it also provides each product's distinctive properties. Today, there are many online reviews, resulting in enormous comments and suggestions. However, as fully reading reviews is quite difficult, this article presents 3 algorithms for automatic extraction of product features hidden in e-commerce reviews: a traditional frequency-based product feature extraction (F-PFE), syntax analyzer system (SAS), and the hybrid approach called the frequency and syntax-based product feature extraction (FaS-PFE). The proposed algorithms were tested against 4 different types of products: shampoo, skincare, mobile phone, and tablet, using reviews from amazon.com. Based on the product review used in this study, it was found that the SAS can help improve the performance in terms of precision by 15% when compared with the traditional F-PEE approach. When considering both the word frequency and syntax, FaS-PFE clearly outperforms the other two approaches with 94.00% precision and 95.13% recall.</span>


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Najla M. Alharbi ◽  
Norah S. Alghamdi ◽  
Eman H. Alkhammash ◽  
Jehad F. Al Amri

Consumer feedback is highly valuable in business to assess their performance and is also beneficial to customers as it gives them an idea of what to expect from new products. In this research, the aim is to evaluate different deep learning approaches to accurately predict the opinion of customers based on mobile phone reviews obtained from Amazon.com. The prediction is based on analysing these reviews and categorizing them as positive, negative, or neutral. Different deep learning algorithms have been implemented and evaluated such as simple RNN with its four variants, namely, Long Short-Term Memory Networks (LRNN), Group Long Short-Term Memory Networks (GLRNN), gated recurrent unit (GRNN), and update recurrent unit (UGRNN). All evaluated algorithms are combined with word embedding as feature extraction approach for sentiment analysis including Glove, word2vec, and FastText by Skip-grams. The five different algorithms with the three feature extraction methods are evaluated based on accuracy, recall, precision, and F1-score for both balanced and unbalanced datasets. For the unbalanced dataset, it was found that the GLRNN algorithms with FastText feature extraction scored the highest accuracy of 93.75%. This result achieved the highest accuracy on this dataset when compared with other methods mentioned in the literature. For the balanced dataset, the highest achieved accuracy was 88.39% by the LRNN algorithm.


Author(s):  
Xian Zhong ◽  
Wenxin Huang ◽  
Ruiqi Luo ◽  
Can Wang

Vision-based behavior recognition is the analysis and recognition of human behavior in video. It has been widely used in many aspects such as multimedia information retrieval, behavior monitoring, and robot perception. This paper uses the Independent Subspace Analysis (ISA) deep network model feature extraction method, which is based on the ISA model and neural network theory, and combines data preprocessing methods, [Formula: see text]-means clustering methods, and Support Vector Machine (SVM) classifiers to achieve video classification and identification of human behavior. The ISA-based deep network model feature extraction method is an unsupervised learning method that can obtain behavior characteristics with good invariance and characterization capabilities in video human behavior. The experiment was conducted on the basis of the Hollywood2 human behavior data set. This experiment was compared with other commonly used human behavior feature extraction and recognition methods. The experimental results validated the effectiveness and advantages of this method in the classification and recognition of human behavior.


2013 ◽  
Vol 52 (05) ◽  
pp. 382-394 ◽  
Author(s):  
M. R. Boland ◽  
R. Miotto ◽  
J. Gao ◽  
C. Weng

SummaryBackground: When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space.Objectives: This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes.Methods: We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively.Results: We extracted 1,437 distinct eligi -bility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction.Conclusions: It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.


Sign in / Sign up

Export Citation Format

Share Document