Analyzing cloud based reviews for product ranking using feature based clustering algorithm

2018 ◽  
Vol 22 (S3) ◽  
pp. 6977-6984 ◽  
Author(s):  
N. Gobi ◽  
A. Rathinavelu
Author(s):  
Danish Irfan ◽  
Xiaofei Xu ◽  
Zengyou He ◽  
Shengchun Deng ◽  
Yunming Ye

Author(s):  
Na Guo ◽  
Yiyi Zhu

The clustering result of K-means clustering algorithm is affected by the initial clustering center and the clustering result is not always global optimal. Therefore, the clustering analysis of vehicle’s driving data feature based on integrated navigation is carried out based on global K-means clustering algorithm. The vehicle mathematical model based on GPS/DR integrated navigation is constructed and the vehicle’s driving data based on GPS/DR integrated navigation, such as vehicle acceleration, are collected. After extracting the vehicle’s driving data features, the feature parameters of vehicle’s driving data are dimensionally reduced based on kernel principal component analysis to reduce the redundancy of feature parameters. The global K-means clustering algorithm converts clustering problem into a series of sub-cluster clustering problems. At the end of each iteration, an incremental method is used to select the next cluster of optimal initial centers. After determining the optimal clustering number, the feature clustering of vehicle’s driving data is completed. The experimental results show that the global K-means clustering algorithm has a clustering error of only 1.37% for vehicle’s driving data features and achieves high precision clustering for vehicle’s driving data features.


2020 ◽  
pp. 1-31
Author(s):  
Abdul Rafae Khan ◽  
Asim Karim ◽  
Hassan Sajjad ◽  
Faisal Kamiran ◽  
Jia Xu

Abstract Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic algorithm UrduPhone, a string matching component, a feature-based similarity function, and a clustering algorithm Lex-Var. UrduPhone encodes Roman Urdu strings to their pronunciation-based representations. The string matching component handles character-level variations that occur when writing Urdu using Roman script. The similarity function incorporates various phonetic-based, string-based, and contextual features of words. The Lex-Var algorithm is a variant of the k-medoids clustering algorithm that groups lexical variations of words. It contains a similarity threshold to balance the number of clusters and their maximum similarity. The framework allows feature learning and optimization in addition to the use of predefined features and weights. We evaluate our framework extensively on four real-world datasets and show an F-measure gain of up to 15% from baseline methods. We also demonstrate the superiority of UrduPhone and Lex-Var in comparison to respective alternate algorithms in our clustering framework for the lexical normalization of Roman Urdu.


Author(s):  
Saddam Hussain ◽  
Mohd Wazir Mustafa ◽  
Touqeer Ahmed Jumani ◽  
Shadi Khan Baloch ◽  
Muhammad Salman Saeed

Author(s):  
Saddam Hussain ◽  
Mohd Wazir Mustafa ◽  
Touqeer Ahmed Jumani ◽  
Shadi Khan Baloch ◽  
Muhammad Salman Saeed

2019 ◽  
Vol 8 (1) ◽  
pp. 15
Author(s):  
Mostafa Langarizadeh ◽  
Rozi Mahmud

Introduction: Thresholding is one of the most important parts of segmentation whenever we want to detect a specific part of image. There are several thresholding methods that previous researchers used them frequently as bi-level techniques such as DBT or multilevel such as 3S. New histogram feature thresholding method is implemented to detect lesion area in digital mammograms and compared with 3S (Shrinking-Search-Space) multithresholding and FCM method in terms of segmentation quality and segmentation time as a benchmark in thresholding.Materials and Methods: These algorithms have been tested on 188 digital mammograms. Digital mammogram image used after preprocessing which was including crop the unnecessary area, resize the image into 1024 by 1024 pixel and then normalize pixel values by using simple contrast stretching method.Results: The results show that suggested method results are not similar with 3S and FCM methods, and it is faster than other methods. This is another superiority of suggested method with respect to others. Results of previous studies showed that FCM is not a reliable clustering algorithm and it needs several run to give us a reliable result (1). Results of this study also showed that this approach is correct.Conclusions: The suggested method may used as a reliable thresholding method in order to detection of lesion area.


Sign in / Sign up

Export Citation Format

Share Document