similarity score
Recently Published Documents


TOTAL DOCUMENTS

248
(FIVE YEARS 152)

H-INDEX

13
(FIVE YEARS 3)

2022 ◽  
Vol 40 (4) ◽  
pp. 1-45
Author(s):  
Weiren Yu ◽  
Julie McCann ◽  
Chengyuan Zhang ◽  
Hakan Ferhatosmanoglu

SimRank is an attractive link-based similarity measure used in fertile fields of Web search and sociometry. However, the existing deterministic method by Kusumoto et al. [ 24 ] for retrieving SimRank does not always produce high-quality similarity results, as it fails to accurately obtain diagonal correction matrix  D . Moreover, SimRank has a “connectivity trait” problem: increasing the number of paths between a pair of nodes would decrease its similarity score. The best-known remedy, SimRank++ [ 1 ], cannot completely fix this problem, since its score would still be zero if there are no common in-neighbors between two nodes. In this article, we study fast high-quality link-based similarity search on billion-scale graphs. (1) We first devise a “varied- D ” method to accurately compute SimRank in linear memory. We also aggregate duplicate computations, which reduces the time of [ 24 ] from quadratic to linear in the number of iterations. (2) We propose a novel “cosine-based” SimRank model to circumvent the “connectivity trait” problem. (3) To substantially speed up the partial-pairs “cosine-based” SimRank search on large graphs, we devise an efficient dimensionality reduction algorithm, PSR # , with guaranteed accuracy. (4) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument in [ 24 ] that “if D is replaced by a scaled identity matrix (1-Ɣ)I, their top-K rankings will not be affected much”. (5) We propose a novel method that can accurately convert from Li et al.  SimRank ~{S} to Jeh and Widom’s SimRank S . (6) We propose GSR # , a generalisation of our “cosine-based” SimRank model, to quantify pairwise similarities across two distinct graphs, unlike SimRank that would assess nodes across two graphs as completely dissimilar. Extensive experiments on various datasets demonstrate the superiority of our proposed approaches in terms of high search quality, computational efficiency, accuracy, and scalability on billion-edge graphs.


2022 ◽  
Vol 19 (3) ◽  
pp. 2774-2799
Author(s):  
Lu Yu ◽  
◽  
Yuliang Lu ◽  
Yi Shen ◽  
Jun Zhao ◽  
...  

<abstract><p>Program-wide binary code diffing is widely used in the binary analysis field, such as vulnerability detection. Mature tools, including BinDiff and TurboDiff, make program-wide diffing using rigorous comparison basis that varies across versions, optimization levels and architectures, leading to a relatively inaccurate comparison result. In this paper, we propose a program-wide binary diffing method based on neural network model that can make diffing across versions, optimization levels and architectures. We analyze the target comparison files in four different granularities, and implement the diffing by both top down process and bottom up process according to the granularities. The top down process aims to narrow the comparison scope, selecting the candidate functions that are likely to be similar according to the call relationship. Neural network model is applied in the bottom up process to vectorize the semantic features of candidate functions into matrices, and calculate the similarity score to obtain the corresponding relationship between functions to be compared. The bottom up process improves the comparison accuracy, while the top down process guarantees efficiency. We have implemented a prototype PBDiff and verified its better performance compared with state-of-the-art BinDiff, Asm2vec and TurboDiff. The effectiveness of PBDiff is further illustrated through the case study of diffing and vulnerability detection in real-world firmware files.</p></abstract>


2021 ◽  
Author(s):  
Damien Olivier-Jimenez ◽  
Zakaria Bouchouireb ◽  
Simon Ollivier ◽  
Julia Mocquard ◽  
Pierre-Marie Allard ◽  
...  

In the context of untargeted metabolomics, molecular networking is a popular and efficient tool which organizes and simplifies mass spectrometry fragmentation data (LC-MS/MS), by clustering ions based on a cosine similarity score. However, the nature of the ion species is rarely taken into account, causing redundancy as a single compound may be present in different forms throughout the network. Taking advantage of the presence of such redundant ions, we developed a new method named MolNotator. Using the different ion species produced by a molecule during ionization (adducts, dimers, trimers, in-source fragments), a predicted molecule node (or neutral node) is created by triangulation, and ultimately computing the associated molecule calculated mass. These neutral nodes provide researchers with several advantages. Firstly, each molecule is then represented in its ionization context, connected to all produced ions and indirectly to some coeluted compounds, thereby also highlighting unexpected widely present adduct species. Secondly, the predicted neutrals serve as anchors to merge the complementary positive and negative ionization modes into a single network. Lastly, the dereplication is improved by the use of all available ions connected to the neutral nodes, and the computed molecular masses can be used for exact mass dereplication. MolNotator is available as a Python library and was validated using the lichen database spectra acquired on an Orbitrap, computing neutral molecules for >90% of the 156 molecules in the dataset. By focusing on actual molecules instead of ions, MolNotator greatly facilitates the selection of molecules of interest.


Biosensors ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 504
Author(s):  
Vicky Mudeng ◽  
Minseok Kim ◽  
Se-woon Choe

Diffuse optical tomography is emerging as a non-invasive optical modality used to evaluate tissue information by obtaining the optical properties’ distribution. Two procedures are performed to produce reconstructed absorption and reduced scattering images, which provide structural information that can be used to locate inclusions within tissues with the assistance of a known light intensity around the boundary. These methods are referred to as a forward problem and an inverse solution. Once the reconstructed image is obtained, a subjective measurement is used as the conventional way to assess the image. Hence, in this study, we developed an algorithm designed to numerically assess reconstructed images to identify inclusions using the structural similarity (SSIM) index. We compared four SSIM algorithms with 168 simulated reconstructed images involving the same inclusion position with different contrast ratios and inclusion sizes. A multiscale, improved SSIM containing a sharpness parameter (MS-ISSIM-S) was proposed to represent the potential evaluation compared with the human visible perception. The results indicated that the proposed MS-ISSIM-S is suitable for human visual perception by demonstrating a reduction of similarity score related to various contrasts with a similar size of inclusion; thus, this metric is promising for the objective numerical assessment of diffuse, optically reconstructed images.


2021 ◽  
Vol 927 (1) ◽  
pp. 011002

All papers published in this volume of IOP Conference Series: Earth and Environmental Science have been peer reviewed through processes administered by the Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a proceedings journal published by IOP Publishing. • Type of peer review : Single-blind • Describe criteria used by Reviewers when accepting/declining papers. Was there the opportunity to resubmit articles after revisions? ∘ The review was done by considering five (5) aspects : 1) Relevance with topics 2) Novelty and originality 3) Clarity 4) Systematic 5) Analysis techniques and deduction ∘ One reviewer gave points on each aspect between 1-4. Based on total points from those four aspects : 1) Definitely Accept : 16-20 2) Accept : 11-15 3) Possibly Accept : 7-10 4) Rejected : 5-6 ∘ There was an opportunity to resubmit articles after revisions. • Conference submission management system: The paper is uploaded via EDAS (https://edas.info/) • Number of submissions received : 70 papers • Number of submissions sent for review : 60 papers • Number of submissions accepted : 48 papers • Acceptance Rate (Number of Submissions Accepted / Number of Submissions Received X 100) : 68.57 • Average number of reviews per paper : 2 • Total number of reviewers involved : 36 • Any additional info on review process (eg Plagiarism check system) : ∘ The similarity score was checked by the editors to find the plagiarism using https://www.turnitin.com/. ∘ The standard similarity score to be accepted is less than 15%. • Contact person for queries (Full name, affiliation, institutional email address) : Name : Ayodya Pradhipta Tenggara Affiliation : Universitas Gadjah Mada Email address : [email protected]


Author(s):  
Salma Adel Elzeheiry ◽  
N. E. Mekky ◽  
A. Atwan ◽  
Noha A. Hikal

<p>Recommendation systems (RSs) are used to obtain advice regarding decision-making. RSs have the shortcoming that a system cannot draw inferences for users or items regarding which it has not yet gathered sufficient information. This issue is known as the cold start issue. Aiming to alleviate the user’s cold start issue, the proposed recommendation algorithm combined tag data and logistic regression classification to predict the probability of the movies for a new user. First using alternating least square to extract product feature, and then diminish the feature vector by combining principal component analysis with logistic regression to predict the probability of genres of the movies. Finally, combining the most relevant tags based on similarity score with probability and find top N movies with high scores to the user. The proposed model is assessed using the root mean square error (RMSE), the mean absolute error (MAE), recall@N and precision@N and it is applied to 1M, 10M and 20M MovieLens datasets, resulting in an accuracy of 0.8806, 0.8791 and 0.8739.</p>


2021 ◽  
Vol 909 (1) ◽  
pp. 011002

All papers published in this volume of IOP Conference Series: Earth and Environmental Science have been peer reviewed through processes administered by the Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a proceedings journal published by IOP Publishing. • Type of peer review: Triple-blind review. The reviewers are anonymous and the authors’ identity is unknown to both reviewers and editors. Articles are anonymized at the submission stage and are handled in such a way to minimize any potential bias towards the author(s). • Conference submission management system: The system was using email correspondence among the conference committee, authors, appointed reviewers, and participants. • Number of submissions received: 23 papers • Number of submissions sent for review: 18 papers • Number of submissions accepted: 18 papers • Acceptance Rate (Number of Submissions Accepted / Number of Submissions Received X 100): 78.26% • Average number of reviews per paper: 2.0 • Total number of reviewers involved: 5 persons • Any additional info on review process: The editors pre-selected manuscripts to send for reviewers based on the scope of the conference. All manuscripts had check similarities index, using Turnitin and accepted manuscripts did not exceed 10% similarity score. The manuscripts then checked by reviewers that have academic status and are scientific experts in their field. The reviewers used the following criteria: relevance to the scope, contribution to science, originality, systematic, and writing accuracy. The reviewers then recommend the papers: accept, accept with minor/major revision, or reject. The committee then sent the papers to authors to revise them accordingly. Revised version was evaluated by the editors and then the editor sent the revised manuscript to the reviewers again for re-evaluation. If required, the review process can be repeated. The decision to accept or reject the final papers was based on the suggestions of reviewers. • Contact person for queries: Name : Prof. Dr. Chairil Anwar Siregar (Chief Editor). Email: [email protected]. Affiliation: Center for Standardization of Sustainable Forest Management Instruments - Agency for Standardization of Environment and Forestry Instruments Ministry of Environment and Forestry of the Republic of Indonesia.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-28
Author(s):  
Hridoy Sankar Dutta ◽  
Mayank Jobanputra ◽  
Himani Negi ◽  
Tanmoy Chakraborty

YouTube sells advertisements on the posted videos, which in turn enables the content creators to monetize their videos. As an unintended consequence, this has proliferated various illegal activities such as artificial boosting of views, likes, comments, and subscriptions. We refer to such videos (gaining likes and comments artificially) and channels (gaining subscriptions artificially) as “collusive entities.” Detecting such collusive entities is an important yet challenging task. Existing solutions mostly deal with the problem of spotting fake views, spam comments, fake content, and so on, and oftentimes ignore how such fake activities emerge via collusion. Here, we collect a large dataset consisting of two types of collusive entities on YouTube— videos submitted to gain collusive likes and comment requests and channels submitted to gain collusive subscriptions. We begin by providing an in-depth analysis of collusive entities on YouTube fostered by various blackmarket services . Following this, we propose models to detect three types of collusive YouTube entities: videos seeking collusive likes, channels seeking collusive subscriptions, and videos seeking collusive comments. The third type of entity is associated with temporal information. To detect videos and channels for collusive likes and subscriptions, respectively, we utilize one-class classifiers trained on our curated collusive entities and a set of novel features. The SVM-based model shows significant performance with a true positive rate of 0.911 and 0.910 for detecting collusive videos and collusive channels, respectively. To detect videos seeking collusive comments, we propose CollATe , a novel end-to-end neural architecture that leverages time-series information of posted comments along with static metadata of videos. CollATe is composed of three components: metadata feature extractor (which derives metadata-based features from videos), anomaly feature extractor (which utilizes the time-series data to detect sudden changes in the commenting activity), and comment feature extractor (which utilizes the text of the comments posted during collusion and computes a similarity score between the comments). Extensive experiments show the effectiveness of CollATe  (with a true positive rate of 0.905) over the baselines.


Sign in / Sign up

Export Citation Format

Share Document