scholarly journals mokapot: Fast and Flexible Semisupervised Learning for Peptide Detection

2021 ◽  
Vol 20 (4) ◽  
pp. 1966-1971
Author(s):  
William E. Fondrie ◽  
William S. Noble
2001 ◽  
Vol 11 (13) ◽  
pp. 1635-1638 ◽  
Author(s):  
Edward James Iorio ◽  
Yuefei Shao ◽  
Chao-Tsen Chen ◽  
Holger Wagner ◽  
W.Clark Still
Keyword(s):  

2018 ◽  
Vol 21 (61) ◽  
pp. 67
Author(s):  
Cristian Cardellino ◽  
Laura Alonso Alemany

This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is nameddisjoint semisupervised learning: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.


2021 ◽  
Vol 2021 ◽  
pp. 1-5
Author(s):  
Hai Zhu ◽  
Jie Zhang ◽  
Xingsi Xue

Sensor ontology models the sensor information and knowledge in a machine-understandable way, which aims at addressing the data heterogeneity problem on the Internet of Things (IoT). However, the existing sensor ontologies are maintained independently for different requirements, which might define the same concept with different terms or context, yielding the heterogeneity issue. Since the complex semantic relationship between the sensor concepts and the large-scale entities is to be dealt with, finding the identical entity correspondences is an error-prone task. To effectively determine the sensor entity correspondences, this work proposes a semisupervised learning-based sensor ontology matching technique. First, we borrow the idea of “centrality” from the social network to construct the training examples; then, we present an evolutionary algorithm- (EA-) based metamatching technique to train the model of aggregating different similarity measures; finally, we use the trained model to match the rest entities. The experiment uses the benchmark as well as three real sensor ontologies to test our proposal’s performance. The experimental results show that our approach is able to determine high-quality sensor entity correspondences in all matching tasks.


2021 ◽  
pp. 1-27
Author(s):  
Tim Sainburg ◽  
Leland McInnes ◽  
Timothy Q. Gentner

Abstract UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.


Author(s):  
Bingbing Xu ◽  
Huawei Shen ◽  
Qi Cao ◽  
Keting Cen ◽  
Xueqi Cheng

Graph convolutional networks gain remarkable success in semi-supervised learning on graph-structured data. The key to graph-based semisupervised learning is capturing the smoothness of labels or features over nodes exerted by graph structure. Previous methods, spectral methods and spatial methods, devote to defining graph convolution as a weighted average over neighboring nodes, and then learn graph convolution kernels to leverage the smoothness to improve the performance of graph-based semi-supervised learning. One open challenge is how to determine appropriate neighborhood that reflects relevant information of smoothness manifested in graph structure. In this paper, we propose GraphHeat, leveraging heat kernel to enhance low-frequency filters and enforce smoothness in the signal variation on the graph. GraphHeat leverages the local structure of target node under heat diffusion to determine its neighboring nodes flexibly, without the constraint of order suffered by previous methods. GraphHeat achieves state-of-the-art results in the task of graph-based semi-supervised classification across three benchmark datasets: Cora, Citeseer and Pubmed.


Sign in / Sign up

Export Citation Format

Share Document