Translation Accuracy Evaluation of English Complex Long Sentences Based on Multi-Label Clustering Algorithms

Author(s):  
Fei Wu

In the traditional sense, the translation evaluation of English complex long sentences is often limited to the idea of whether or how to realize the semantic transformation of the original text, so many phenomena that have nothing to do with language but directly affect the translation evaluation are not included in the field of vision and can be interpreted. In order to solve the above problems, a multi-label clustering algorithm is proposed to evaluate the translation accuracy of English complex long sentences. The multi-label clustering algorithm is introduced into the translation evaluation activities to carry out the translation and detection parameters of complex long sentences. The comprehensive description, the accuracy of generalization and the rationality of interpretation lay a solid foundation for English translation activities.

Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Author(s):  
Herry Sujaini

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.


Author(s):  
M. Tanveer ◽  
Tarun Gupta ◽  
Miten Shah ◽  

Twin Support Vector Clustering (TWSVC) is a clustering algorithm inspired by the principles of Twin Support Vector Machine (TWSVM). TWSVC has already outperformed other traditional plane based clustering algorithms. However, TWSVC uses hinge loss, which maximizes shortest distance between clusters and hence suffers from noise-sensitivity and low re-sampling stability. In this article, we propose Pinball loss Twin Support Vector Clustering (pinTSVC) as a clustering algorithm. The proposed pinTSVC model incorporates the pinball loss function in the plane clustering formulation. Pinball loss function introduces favorable properties such as noise-insensitivity and re-sampling stability. The time complexity of the proposed pinTSVC remains equivalent to that of TWSVC. Extensive numerical experiments on noise-corrupted benchmark UCI and artificial datasets have been provided. Results of the proposed pinTSVC model are compared with TWSVC, Twin Bounded Support Vector Clustering (TBSVC) and Fuzzy c-means clustering (FCM). Detailed and exhaustive comparisons demonstrate the better performance and generalization of the proposed pinTSVC for noise-corrupted datasets. Further experiments and analysis on the performance of the above-mentioned clustering algorithms on structural MRI (sMRI) images taken from the ADNI database, face clustering, and facial expression clustering have been done to demonstrate the effectiveness and feasibility of the proposed pinTSVC model.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Author(s):  
R. R. Gharieb ◽  
G. Gendy ◽  
H. Selim

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.


2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


Sign in / Sign up

Export Citation Format

Share Document