data similarity Latest Research Papers

Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets.

Download Full-text

Variational quantum algorithms for trace distance and fidelity estimation

Quantum Science and Technology ◽

10.1088/2058-9565/ac38ba ◽

2021 ◽

Author(s):

Ranyiliu Chen ◽

Zhixin Song ◽

Xuanqiang Zhao ◽

Xin Wang

Keyword(s):

Quantum Algorithms ◽

Distance Estimation ◽

Distance Measures ◽

Input State ◽

Mixed States ◽

Estimation Task ◽

Trace Distance ◽

Data Similarity ◽

The Difference ◽

Near Term

Abstract Estimating the difference between quantum data is crucial in quantum computing. However, as typical characterizations of quantum data similarity, the trace distance and quantum ﬁdelity are believed to be exponentiallyhard to evaluate in general. In this work, we introduce hybrid quantum-classical algorithms for these two distance measures on near-term quantum devices where no assumption of input state is required. First, we introduce the Variational Trace Distance Estimation (VTDE) algorithm. We in particular provide the technique to extract the desired spectrum information of any Hermitian matrix by local measurement. A novel variational algorithm for trace distance estimation is then derived from this technique, with the assistance of a single ancillary qubit. Notably, VTDE could avoid the barren plateau issue with logarithmic depth circuits due to a local cost function. Second, we introduce the Variational Fidelity Estimation (VFE) algorithm. We combine Uhlmann’s theorem and the freedom in puriﬁcation to translate the estimation task into an optimization problem over a unitary on an ancillary system with ﬁxed puriﬁed inputs. We then provide a puriﬁcation subroutine to complete the translation. Both algorithms are veriﬁed by numerical simulations and experimental implementations, exhibiting high accuracy for randomly generated mixed states.

Download Full-text

A Hypergraph-based Method for Pharmaceutical Data Similarity Retrieval

10.1145/3490322.3490344 ◽

2021 ◽

Author(s):

Mingyue Li ◽

Lixin Du ◽

Jiangying Xu ◽

Chen Guo

Keyword(s):

Similarity Retrieval ◽

Data Similarity

Download Full-text

Semi-supervised support vector regression based on data similarity and its application to rock-mechanics parameters estimation

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2021.104317 ◽

2021 ◽

Vol 104 ◽

pp. 104317

Author(s):

Xi Chen ◽

Weihua Cao ◽

Chao Gan ◽

Yasuhiro Ohyama ◽

Jinhua She ◽

...

Keyword(s):

Support Vector Regression ◽

Rock Mechanics ◽

Parameters Estimation ◽

Support Vector ◽

Data Similarity

Download Full-text

An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computing

The Journal of Supercomputing ◽

10.1007/s11227-021-04016-8 ◽

2021 ◽

Author(s):

Maziyar Grami

Keyword(s):

Cloud Computing ◽

Big Data ◽

Statistical Analysis ◽

Energy Aware ◽

Data Similarity ◽

Energy Aware Scheduling

Download Full-text

cLoops2: a full-stack comprehensive analytical tool for chromatin interactions

10.1101/2021.07.20.453068 ◽

2021 ◽

Author(s):

Yaqiang Cao ◽

Shuai Liu ◽

Gang Ren ◽

Qingsong Tang ◽

Keji Zhao

Keyword(s):

Gene Expression Regulation ◽

Regulatory Elements ◽

Chromatin Interaction ◽

Similarity Estimation ◽

Chromatin Interactions ◽

Data Similarity ◽

Sequencing Cost ◽

Versatile Tool ◽

Resolution Estimation ◽

Aggregation Analysis

Investigating chromatin interactions between regulatory regions such as enhancer and promoter elements is vital for a deeper understanding of gene expression regulation. The emerging 3D mapping technologies focusing on enriched signals such as Hi-TrAC/TrAC-looping, compared to Hi-C and variants, reduce the sequencing cost and provide higher interaction resolution for cis-regulatory elements. A robust pipeline is needed for the comprehensive interpretation of these data, especially for loop-centric analysis. Therefore, we have developed a new versatile tool named cLoops2 for the full-stack analysis of the 3D chromatin interaction data. cLoops2 consists of core modules for peak-calling, loop-calling, differentially enriched loops calling and loops annotation. Additionally, it also contains multiple modules to carry out interaction resolution estimation, data similarity estimation, features quantification and aggregation analysis, and visualization. cLoops2 with documentation and example data are open source and freely available at GitHub: https://github.com/YaqiangCao/cLoops2.

Download Full-text

The Hybrid of Jaro-Winkler and Rabin-Karp Algorithm in Detecting Indonesian Text Similarity

Jurnal Online Informatika ◽

10.15575/join.v6i1.640 ◽

2021 ◽

Vol 6 (1) ◽

pp. 88

Author(s):

Muhamad Arief Yulianto ◽

Nurhasanah Nurhasanah

Keyword(s):

String Matching ◽

Pattern Search ◽

Search Process ◽

Test Results ◽

Average Percentage ◽

Data Similarity ◽

Matching Technique ◽

Single Pattern ◽

Merging Process ◽

Matching Type

The String-matching technique is part of the similarity technique. This technique can detect the similarity level of the text. The Rabin-Karp is an algorithm of string-matching type. The Rabin-Karp is capable of multiple patterns searching but does not match a single pattern. The Jaro-Winkler Distance algorithm can find strings within approximate string matching. This algorithm is very suitable and gives the best results on the matching of two short strings. This study aims to overcome the shortcomings of the Rabin-Karp algorithm in the single pattern search process by combining the Jaro-Winkler and Rabin-Karp algorithm methods. The merging process started from pre-processing and forming the K-Gram data. Then, it was followed by the calculation of the hash value for each K-Gram by the Rabin-Karp algorithm. The process of finding the same hash score and calculating the percentage level of data similarity used the Jaro-Winkler algorithm. The test was done by comparing words, sentences, and journal abstracts that have been rearranged. The average percentage of the test results for the similarity level of words in the combination algorithm has increased. In contrast, the results of the percentage test for the level of similarity of sentences and journal abstracts have decreased. The experimental results showed that the combination of the Jaro-Winkler algorithm on the Rabin-Karp algorithm can improve the similarity of text accuracy.

Download Full-text

Rainfall data Similarity Assessment of the Coordinated Regional Downscaling Experiments South East Asia Models to Observation in the Bintan Island

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/789/1/012051 ◽

2021 ◽

Vol 789 (1) ◽

pp. 012051

Author(s):

Muhamad R. Djuwansah ◽

Ida Narulita ◽

Faiz R. Fajary ◽

Asep Mulyono

Keyword(s):

East Asia ◽

Rainfall Data ◽

South East Asia ◽

Similarity Assessment ◽

Data Similarity ◽

Regional Downscaling

Download Full-text

A new validity function of FCM clustering algorithm based on intra-class compactness and inter-class separation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210555 ◽

2021 ◽

pp. 1-22

Author(s):

H.Y. Wang ◽

J.S. Wang ◽

L.F. Zhu

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Data Sets ◽

Data Set ◽

Class Separation ◽

Data Similarity ◽

Fcm Clustering ◽

Membership Matrix ◽

Clustering Validity ◽

Optimal Number Of Clusters

Fuzzy C-means (FCM) clustering algorithm is a widely used method in data mining. However, there is a big limitation that the predefined number of clustering must be given. So it is very important to find an optimal number of clusters. Therefore, a new validity function of FCM clustering algorithm is proposed to verify the validity of the clustering results. This function is defined based on the intra-class compactness and inter-class separation from the fuzzy membership matrix, the data similarity between classes and the geometric structure of the data set, whose minimum value represents the optimal clustering partition result. The proposed clustering validity function and seven traditional clustering validity functions are experimentally verified on four artificial data sets and six UCI data sets. The simulation results show that the proposed validity function can obtain the optimal clustering number of the data set more accurately, and can still find the more accurate clustering number under the condition of changing the fuzzy weighted index, which has strong adaptability and robustness.

Download Full-text

data similarity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Retrieving of Data Similarity using Metadata on a Data Analysis Competition Platform

Clustering: Finding Patterns in the Darkness

Variational quantum algorithms for trace distance and fidelity estimation

A Hypergraph-based Method for Pharmaceutical Data Similarity Retrieval

Semi-supervised support vector regression based on data similarity and its application to rock-mechanics parameters estimation

An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computing

cLoops2: a full-stack comprehensive analytical tool for chromatin interactions

The Hybrid of Jaro-Winkler and Rabin-Karp Algorithm in Detecting Indonesian Text Similarity

Rainfall data Similarity Assessment of the Coordinated Regional Downscaling Experiments South East Asia Models to Observation in the Bintan Island

A new validity function of FCM clustering algorithm based on intra-class compactness and inter-class separation

Export Citation Format

data similarityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Retrieving of Data Similarity using Metadata on a Data Analysis Competition Platform

Clustering: Finding Patterns in the Darkness

Variational quantum algorithms for trace distance and fidelity estimation

A Hypergraph-based Method for Pharmaceutical Data Similarity Retrieval

Semi-supervised support vector regression based on data similarity and its application to rock-mechanics parameters estimation

An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computing

cLoops2: a full-stack comprehensive analytical tool for chromatin interactions

The Hybrid of Jaro-Winkler and Rabin-Karp Algorithm in Detecting Indonesian Text Similarity

Rainfall data Similarity Assessment of the Coordinated Regional Downscaling Experiments South East Asia Models to Observation in the Bintan Island

A new validity function of FCM clustering algorithm based on intra-class compactness and inter-class separation

data similarity
Recently Published Documents