Efficient model similarity estimation with robust hashing

Author(s):  
Salvador Martínez ◽  
Sébastien Gérard ◽  
Jordi Cabot
Author(s):  
Zhixian Liu ◽  
Qingfeng Chen ◽  
Wei Lan ◽  
Jiahai Liang ◽  
Yiping Pheobe Chen ◽  
...  

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.


2021 ◽  
pp. 1-10
Author(s):  
Hye-Jeong Song ◽  
Tak-Sung Heo ◽  
Jong-Dae Kim ◽  
Chan-Young Park ◽  
Yu-Seop Kim

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.


2017 ◽  
Vol 24 (6) ◽  
pp. 803-818 ◽  
Author(s):  
Keke Lai ◽  
Samuel B. Green ◽  
Roy Levy

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Sheng Hu ◽  
Shuanjun Song ◽  
Wenhui Liu

Considering the problem that the process quality state is difficult to analyze and monitor under manufacturing big data, this paper proposed a data cloud model similarity-based quality fluctuation monitoring method in data-driven production process. Firstly, the randomness of state fluctuation is characterized by entropy and hyperentropy features. Then, the cloud pool drive model between quality fluctuation monitoring parameters is built. On this basis, cloud model similarity degree from the perspective of maximum fluctuation border is defined and calculated to realize the process state analysis and monitoring. Finally, the experiment is conducted to verify the adaptability and performance of the cloud model similarity-based quality control approach, and the results indicate that the proposed approach is a feasible and acceptable method to solve the process fluctuation monitoring and quality stability analysis in the production process.


2018 ◽  
Vol 25 (5) ◽  
pp. 984-995 ◽  
Author(s):  
Kun Luo ◽  
Xiaoyan Lei

Based on the model similarity theory, this article deduces the model similarity relationship of the elevated railway box girder at the elastic stage and designs a 1/10 box girder scale model by adopting a 32 m simply-supported box girder bridge from the Beijing–Shanghai Railway as the prototype. It then verifies the validity of the model design and the dynamic similarity between the 1/10 model and the prototype through constraint mode and free mode experiments on the 1/10 scale model, together with transient finite element calculation. The dynamic calculation model is utilized here for the analysis of the errors occurring in the production of the model, and the effect of the model structure simplification on the box girder mode frequency and vibration response. Finally, the article studies the vibration transmissibility characteristics between the plates and along the longitudinal direction by means of model testing. It also discusses the effect of different bridge support stiffness on the box girder vibration. The results presented in this paper can provide a method for forecasting and evaluating the existing or plan-to-build high speed railway environment vibration.


2018 ◽  
Author(s):  
Will P. M. Rowe ◽  
Anna Paola Carrieri ◽  
Cristina Alcon-Giner ◽  
Shabhonam Caim ◽  
Alex Shaw ◽  
...  

AbstractMotivationThe growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research; allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time.To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching, and classification of microbiome samples in near real-time.ResultsWe apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can be used to efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we show that histosketches can be used to train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a Random Forest Classifier that could accurately predict whether the neonate had received antibiotic treatment (95% accuracy, precision 97%) and could subsequently be used to classify microbiome data streams in less than 12 seconds.We provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2GB microbiome in 50 seconds on a standard laptop using 4 cores, with the sketch occupying 3000 bytes of disk space.AvailabilityOur implementation (HULK) is written in Go and is available at: https://github.com/will-rowe/hulk (MIT License)


Sign in / Sign up

Export Citation Format

Share Document