Efficient model similarity estimation with robust hashing

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

Graphical Displays for Understanding SEM Model Similarity

Structural Equation Modeling A Multidisciplinary Journal ◽

10.1080/10705511.2017.1334206 ◽

2017 ◽

Vol 24 (6) ◽

pp. 803-818 ◽

Cited By ~ 6

Author(s):

Keke Lai ◽

Samuel B. Green ◽

Roy Levy

Keyword(s):

Graphical Displays ◽

Model Similarity

Download Full-text

The Combined Method of Semantic Similarity Estimation of Problem Oriented Knowledge on the Basis of Evolutionary Procedures

Advances in Intelligent Systems and Computing - Artificial Intelligence Trends in Intelligent Systems ◽

10.1007/978-3-319-57261-1_8 ◽

2017 ◽

pp. 74-83 ◽

Cited By ~ 2

Author(s):

V. V. Bova ◽

E. V. Nuzhnov ◽

V. V. Kureichik

Keyword(s):

Semantic Similarity ◽

Combined Method ◽

Similarity Estimation

Download Full-text

A Framework of Cloud Model Similarity-Based Quality Control Method in Data-Driven Production Process

Mathematical Problems in Engineering ◽

10.1155/2020/7153841 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Sheng Hu ◽

Shuanjun Song ◽

Wenhui Liu

Keyword(s):

Quality Control ◽

Production Process ◽

Control Method ◽

Cloud Model ◽

Data Driven ◽

Monitoring Method ◽

Control Approach ◽

Acceptable Method ◽

And Performance ◽

Model Similarity

Considering the problem that the process quality state is difficult to analyze and monitor under manufacturing big data, this paper proposed a data cloud model similarity-based quality fluctuation monitoring method in data-driven production process. Firstly, the randomness of state fluctuation is characterized by entropy and hyperentropy features. Then, the cloud pool drive model between quality fluctuation monitoring parameters is built. On this basis, cloud model similarity degree from the perspective of maximum fluctuation border is defined and calculated to realize the process state analysis and monitoring. Finally, the experiment is conducted to verify the adaptability and performance of the cloud model similarity-based quality control approach, and the results indicate that the proposed approach is a feasible and acceptable method to solve the process fluctuation monitoring and quality stability analysis in the production process.

Download Full-text

Musical perceptual similarity estimation using interactive genetic algorithm

IEEE Congress on Evolutionary Computation ◽

10.1109/cec.2010.5586527 ◽

2010 ◽

Cited By ~ 3

Author(s):

Shangfei Wang ◽

Hua Zhu

Keyword(s):

Genetic Algorithm ◽

Perceptual Similarity ◽

Interactive Genetic Algorithm ◽

Similarity Estimation

Download Full-text

A study of modeling experiments of the vibration behavior of elevated railway box girder

Journal of Vibration and Control ◽

10.1177/1077546318807283 ◽

2018 ◽

Vol 25 (5) ◽

pp. 984-995 ◽

Cited By ~ 4

Author(s):

Kun Luo ◽

Xiaoyan Lei

Keyword(s):

High Speed ◽

Similarity Theory ◽

Longitudinal Direction ◽

Calculation Model ◽

Scale Model ◽

Box Girder ◽

Vibration Transmissibility ◽

Girder Bridge ◽

Structure Simplification ◽

Model Similarity

Based on the model similarity theory, this article deduces the model similarity relationship of the elevated railway box girder at the elastic stage and designs a 1/10 box girder scale model by adopting a 32 m simply-supported box girder bridge from the Beijing–Shanghai Railway as the prototype. It then verifies the validity of the model design and the dynamic similarity between the 1/10 model and the prototype through constraint mode and free mode experiments on the 1/10 scale model, together with transient finite element calculation. The dynamic calculation model is utilized here for the analysis of the errors occurring in the production of the model, and the effect of the model structure simplification on the box girder mode frequency and vibration response. Finally, the article studies the vibration transmissibility characteristics between the plates and along the longitudinal direction by means of model testing. It also discusses the effect of different bridge support stiffness on the box girder vibration. The results presented in this paper can provide a method for forecasting and evaluating the existing or plan-to-build high speed railway environment vibration.

Download Full-text

Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain

Semantic Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-14122-0_11 ◽

2014 ◽

pp. 129-145 ◽

Cited By ~ 4

Author(s):

Ahmad Pesaranghader ◽

Azadeh Rezaei ◽

Ali Pesaranghader

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Biomedical Domain ◽

Similarity Estimation

Download Full-text

Streaming histogram sketching for rapid microbiome analytics

10.1101/408070 ◽

2018 ◽

Author(s):

Will P. M. Rowe ◽

Anna Paola Carrieri ◽

Cristina Alcon-Giner ◽

Shabhonam Caim ◽

Alex Shaw ◽

...

Keyword(s):

Locality Sensitive Hashing ◽

Genomic Research ◽

Compact Representation ◽

Sample Type ◽

Sequencing Data ◽

Similarity Estimation ◽

Microbiome Research ◽

Microbiome Data ◽

Similarity Searches

AbstractMotivationThe growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research; allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time.To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching, and classification of microbiome samples in near real-time.ResultsWe apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can be used to efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we show that histosketches can be used to train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a Random Forest Classifier that could accurately predict whether the neonate had received antibiotic treatment (95% accuracy, precision 97%) and could subsequently be used to classify microbiome data streams in less than 12 seconds.We provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2GB microbiome in 50 seconds on a standard laptop using 4 cores, with the sketch occupying 3000 bytes of disk space.AvailabilityOur implementation (HULK) is written in Go and is available at: https://github.com/will-rowe/hulk (MIT License)

Download Full-text