similarity comparison
Recently Published Documents


TOTAL DOCUMENTS

112
(FIVE YEARS 36)

H-INDEX

12
(FIVE YEARS 3)

This paper addresses the problem of automatic recognition of out-of-topic documents from a small set of similar documents that are expected to be on some common topic. The objective is to remove documents of noise from a set. A topic model based classification framework is proposed for the task of discovering out-of-topic documents. This paper introduces a new concept of annotated {\it search engine suggests}, where this paper takes whichever search queries were used to search for a page as representations of content in that page. This paper adopted word embedding to create distributed representation of words and documents, and perform similarity comparison on search engine suggests. It is shown that search engine suggests can be highly accurate semantic representations of textual content and demonstrate that our document analysis algorithm using such representation for relevance measure gives satisfactory performance in terms of in-topic content filtering compared to the baseline technique of topic probability ranking.


Author(s):  
Chen Zhao ◽  
Takehito Utsuro ◽  
Yasuhide Kawada

This paper addresses the problem of automatic recognition of out-of-topic documents from a small set of similar documents that are expected to be on some common topic. The objective is to remove documents of noise from a set. A topic model based classification framework is proposed for the task of discovering out-of-topic documents. This paper introduces a new concept of annotated {\it search engine suggests}, where this paper takes whichever search queries were used to search for a page as representations of content in that page. This paper adopted word embedding to create distributed representation of words and documents, and perform similarity comparison on search engine suggests. It is shown that search engine suggests can be highly accurate semantic representations of textual content and demonstrate that our document analysis algorithm using such representation for relevance measure gives satisfactory performance in terms of in-topic content filtering compared to the baseline technique of topic probability ranking.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Yan Wang ◽  
Peng Jia ◽  
Cheng Huang ◽  
Jiayong Liu ◽  
Peisong He

Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.


2021 ◽  
Vol 17 (1) ◽  
pp. 17-24
Author(s):  
Kyung-Yeob Park ◽  
Joo-Sung Kim ◽  
Hyun-Soo Kim ◽  
Dong-Myung Shin

2021 ◽  
Author(s):  
Feng Deng ◽  
Jeffrey Zheng

Abstract Many studies on COVID-19 have been carried out, and it is interesting to apply methods and models to process the whole sequence of RNA. Similarity comparison of SARS-CoV-2 genomes plays a key role in naturally tracing its ori-gin in scientific exploration, and further explorations are required. In this paper, an innovative of transformation from a 2D density matrix to 1D measuring vector is proposed based on the A5 module of the MAS for visualization. The core transformation projects whole RNA sequences of multiple coronaviruses in 2D matrices and then forms 1D measuring vectors on variant maps. The relationships of SARS-CoV-2 genomes are compared by their similarity properties and genomic index of entropy quantities applied to classify relevant results into groups.


Sign in / Sign up

Export Citation Format

Share Document