data similarity
Recently Published Documents


TOTAL DOCUMENTS

96
(FIVE YEARS 36)

H-INDEX

11
(FIVE YEARS 3)

Author(s):  
Hiroki Sakaji ◽  
Teruaki Hayashi ◽  
Yoshiaki Fukami ◽  
Takumi Shimizu ◽  
Hiroyasu Matsushima ◽  
...  

2021 ◽  
pp. 1-28
Author(s):  
Hector Menendez

Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets.


Author(s):  
Ranyiliu Chen ◽  
Zhixin Song ◽  
Xuanqiang Zhao ◽  
Xin Wang

Abstract Estimating the difference between quantum data is crucial in quantum computing. However, as typical characterizations of quantum data similarity, the trace distance and quantum fidelity are believed to be exponentiallyhard to evaluate in general. In this work, we introduce hybrid quantum-classical algorithms for these two distance measures on near-term quantum devices where no assumption of input state is required. First, we introduce the Variational Trace Distance Estimation (VTDE) algorithm. We in particular provide the technique to extract the desired spectrum information of any Hermitian matrix by local measurement. A novel variational algorithm for trace distance estimation is then derived from this technique, with the assistance of a single ancillary qubit. Notably, VTDE could avoid the barren plateau issue with logarithmic depth circuits due to a local cost function. Second, we introduce the Variational Fidelity Estimation (VFE) algorithm. We combine Uhlmann’s theorem and the freedom in purification to translate the estimation task into an optimization problem over a unitary on an ancillary system with fixed purified inputs. We then provide a purification subroutine to complete the translation. Both algorithms are verified by numerical simulations and experimental implementations, exhibiting high accuracy for randomly generated mixed states.


2021 ◽  
Author(s):  
Mingyue Li ◽  
Lixin Du ◽  
Jiangying Xu ◽  
Chen Guo

2021 ◽  
Author(s):  
Yaqiang Cao ◽  
Shuai Liu ◽  
Gang Ren ◽  
Qingsong Tang ◽  
Keji Zhao

Investigating chromatin interactions between regulatory regions such as enhancer and promoter elements is vital for a deeper understanding of gene expression regulation. The emerging 3D mapping technologies focusing on enriched signals such as Hi-TrAC/TrAC-looping, compared to Hi-C and variants, reduce the sequencing cost and provide higher interaction resolution for cis-regulatory elements. A robust pipeline is needed for the comprehensive interpretation of these data, especially for loop-centric analysis. Therefore, we have developed a new versatile tool named cLoops2 for the full-stack analysis of the 3D chromatin interaction data. cLoops2 consists of core modules for peak-calling, loop-calling, differentially enriched loops calling and loops annotation. Additionally, it also contains multiple modules to carry out interaction resolution estimation, data similarity estimation, features quantification and aggregation analysis, and visualization. cLoops2 with documentation and example data are open source and freely available at GitHub: https://github.com/YaqiangCao/cLoops2.


2021 ◽  
Vol 6 (1) ◽  
pp. 88
Author(s):  
Muhamad Arief Yulianto ◽  
Nurhasanah Nurhasanah

The String-matching technique is part of the similarity technique. This technique can detect the similarity level of the text. The Rabin-Karp is an algorithm of string-matching type. The Rabin-Karp is capable of multiple patterns searching but does not match a single pattern. The Jaro-Winkler Distance algorithm can find strings within approximate string matching. This algorithm is very suitable and gives the best results on the matching of two short strings. This study aims to overcome the shortcomings of the Rabin-Karp algorithm in the single pattern search process by combining the Jaro-Winkler and Rabin-Karp algorithm methods. The merging process started from pre-processing and forming the K-Gram data. Then, it was followed by the calculation of the hash value for each K-Gram by the Rabin-Karp algorithm. The process of finding the same hash score and calculating the percentage level of data similarity used the Jaro-Winkler algorithm. The test was done by comparing words, sentences, and journal abstracts that have been rearranged. The average percentage of the test results for the similarity level of words in the combination algorithm has increased. In contrast, the results of the percentage test for the level of similarity of sentences and journal abstracts have decreased. The experimental results showed that the combination of the Jaro-Winkler algorithm on the Rabin-Karp algorithm can improve the similarity of text accuracy.


2021 ◽  
pp. 1-22
Author(s):  
H.Y. Wang ◽  
J.S. Wang ◽  
L.F. Zhu

Fuzzy C-means (FCM) clustering algorithm is a widely used method in data mining. However, there is a big limitation that the predefined number of clustering must be given. So it is very important to find an optimal number of clusters. Therefore, a new validity function of FCM clustering algorithm is proposed to verify the validity of the clustering results. This function is defined based on the intra-class compactness and inter-class separation from the fuzzy membership matrix, the data similarity between classes and the geometric structure of the data set, whose minimum value represents the optimal clustering partition result. The proposed clustering validity function and seven traditional clustering validity functions are experimentally verified on four artificial data sets and six UCI data sets. The simulation results show that the proposed validity function can obtain the optimal clustering number of the data set more accurately, and can still find the more accurate clustering number under the condition of changing the fuzzy weighted index, which has strong adaptability and robustness.


Sign in / Sign up

Export Citation Format

Share Document