similarity learning
Recently Published Documents


TOTAL DOCUMENTS

313
(FIVE YEARS 141)

H-INDEX

22
(FIVE YEARS 4)

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 131
Author(s):  
Sang Ho Oh ◽  
Seunghwa Back ◽  
Jongyoul Park

Patient similarity research is one of the most fundamental tasks in healthcare, helping to make decisions without incurring additional time and costs in clinical practices. Patient similarity can also apply to various medical fields, such as cohort analysis and personalized treatment recommendations. Because of this importance, patient similarity measurement studies are actively being conducted. However, medical data have complex, irregular, and sequential characteristics, making it challenging to measure similarity. Therefore, measuring accurate similarity is a significant problem. Existing similarity measurement studies use supervised learning to calculate the similarity between patients, with similarity measurement studies conducted only on one specific disease. However, it is not realistic to consider only one kind of disease, because other conditions usually accompany it; a study to measure similarity with multiple diseases is needed. This research proposes a convolution neural network-based model that jointly combines feature learning and similarity learning to define similarity in patients with multiple diseases. We used the cohort data from the National Health Insurance Sharing Service of Korea for the experiment. Experimental results verify that the proposed model has outstanding performance when compared to other existing models for measuring multiple-disease patient similarity.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 3
Author(s):  
Giacomo Frisoni ◽  
Gianluca Moro ◽  
Giulio Carlassare ◽  
Antonella Carbonaro

The automatic extraction of biomedical events from the scientific literature has drawn keen interest in the last several years, recognizing complex and semantically rich graphical interactions otherwise buried in texts. However, very few works revolve around learning embeddings or similarity metrics for event graphs. This gap leaves biological relations unlinked and prevents the application of machine learning techniques to promote discoveries. Taking advantage of recent deep graph kernel solutions and pre-trained language models, we propose Deep Divergence Event Graph Kernels (DDEGK), an unsupervised inductive method to map events into low-dimensional vectors, preserving their structural and semantic similarities. Unlike most other systems, DDEGK operates at a graph level and does not require task-specific labels, feature engineering, or known correspondences between nodes. To this end, our solution compares events against a small set of anchor ones, trains cross-graph attention networks for drawing pairwise alignments (bolstering interpretability), and employs transformer-based models to encode continuous attributes. Extensive experiments have been done on nine biomedical datasets. We show that our learned event representations can be effectively employed in tasks such as graph classification, clustering, and visualization, also facilitating downstream semantic textual similarity. Empirical results demonstrate that DDEGK significantly outperforms other state-of-the-art methods.


2021 ◽  
Author(s):  
Seung-yeoun Kang ◽  
Jeong-hoon Mo

BACKGROUND Similarity-based machine-learning methodologies are suitable for personalized prediction and recommendation research, which is actively applied in healthcare field along with the generalization of EHR data. In particular, the similarity learning model which carefully reflects age can be efficiently used in predicting chronic diseases, closely related to ageing. OBJECTIVE We aimed to design a similarity model for patients in different age-groups in order to predict the two major chronic diseases: Diabetes and Hypertension. METHODS We developed an idea about learning the overlapping periods of two individuals by moving the viewpoint of them to future and past respectively. From this idea, we build separated similarity learning models through three sequential age-group intervals; 30-40, 40-50, 50-60 age-groups intervals. Each model has same structure based on deep neural network. For similarity learning, we set several demographic/bi-annual check-up information and diagnosis records as input features and disease based yes-or-no similarity labels as output features. RESULTS As a result of applying hypertension patients’ pair, diabetes patients’ pair, and non-diabetes/diabetes patient pair to our methodology, the similarity value was very high, close to 1 in the former two cases, and the similarity value was low, close to zero, in the last case. This proves that similarity learning appropriately reflects the disease status between individuals. In addition, we tried to find out how the conventional single-timepoint methodology and our methodology differ in the measurement of similarity for several special cases in which the patient's disease condition changes. As a result, it was found that the similarity results between the existing methodology and our methodology differ from at least 0.2 to at most 0.9 in four special cases where the patient's condition changes. This suggests that our methodology responds more sensitively to the patient's condition changing over time and can be applied more efficiently to disease prediction in those cases. CONCLUSIONS We developed an age-sensitive similarity learning model for personalized prediction of chronic diseases targeting Koreans. As a result, for the cases that patient's disease pattern changes, by designing and learning a deep similarity learning model using divided age groups which has not been previously attempted, we have shown that similarity learning results are better than conventional single-timepoint methodology. Moreover, we proposed the possibility of overcoming data shortage limitations that occur frequently in medical datasets through a similarity learning model considering patients’ age differences.


2021 ◽  
Author(s):  
Tan Nguyen ◽  
Erich Strohmaier ◽  
John Shalf

Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1670
Author(s):  
Hyundoo Jeong ◽  
Sungtae Shin ◽  
Hong-Gi Yeom

Single-cell sequencing provides novel means to interpret the transcriptomic profiles of individual cells. To obtain in-depth analysis of single-cell sequencing, it requires effective computational methods to accurately predict single-cell clusters because single-cell sequencing techniques only provide the transcriptomic profiles of each cell. Although an accurate estimation of the cell-to-cell similarity is an essential first step to derive reliable single-cell clustering results, it is challenging to obtain the accurate similarity measurement because it highly depends on a selection of genes for similarity evaluations and the optimal set of genes for the accurate similarity estimation is typically unknown. Moreover, due to technical limitations, single-cell sequencing includes a larger number of artificial zeros, and the technical noise makes it difficult to develop effective single-cell clustering algorithms. Here, we describe a novel single-cell clustering algorithm that can accurately predict single-cell clusters in large-scale single-cell sequencing by effectively reducing the zero-inflated noise and accurately estimating the cell-to-cell similarities. First, we construct an ensemble similarity network based on different similarity estimates, and reduce the artificial noise using a random walk with restart framework. Finally, starting from a larger number small size but highly consistent clusters, we iteratively merge a pair of clusters with the maximum similarities until it reaches the predicted number of clusters. Extensive performance evaluation shows that the proposed single-cell clustering algorithm can yield the accurate single-cell clustering results and it can help deciphering the key messages underlying complex biological mechanisms.


2021 ◽  
Author(s):  
Zhiting Wei ◽  
Sheng Zhu ◽  
Xiaohan Chen ◽  
Chenyu Zhu ◽  
Bin Duan ◽  
...  

Transcriptional phenotypic drug discovery has achieved great success, and various compound perturbation-based data resources, such as Connectivity Map (CMap) and Library of Integrated Network-Based Cellular Signatures (LINCS), have been presented. Computational strategies fully mining these resources for phenotypic drug discovery have been proposed, and among them, a fundamental issue is to define the proper similarity between the transcriptional profiles to elucidate the drug mechanism of actions and identify new drug indications. Traditionally, this similarity has been defined in an unsupervised way, and due to the high dimensionality and the existence of high noise in those high-throughput data, it lacks robustness with limited performance. In our study, we present Dr. Sim, which is a general learning-based framework that automatically infers similarity measurement rather than being manually designed and can be used to characterize transcriptional phenotypic profiles for drug discovery with generalized good performance. We evaluated Dr. Sim on comprehensively publicly available in vitro and in vivo datasets in drug annotation and repositioning using high-throughput transcriptional perturbation data and indicated that Dr. Sim significantly outperforms the existing methods and is proved to be a conceptual improvement by learning transcriptional similarity to facilitate the broad utility of high-throughput transcriptional perturbation data for phenotypic drug discovery. The source code and usage of Dr. Sim is available at https://github.com/bm2-lab/DrSim/.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6109
Author(s):  
Nkosikhona Dlamini ◽  
Terence L. van Zyl

Similarity learning using deep convolutional neural networks has been applied extensively in solving computer vision problems. This attraction is supported by its success in one-shot and zero-shot classification applications. The advances in similarity learning are essential for smaller datasets or datasets in which few class labels exist per class such as wildlife re-identification. Improving the performance of similarity learning models comes with developing new sampling techniques and designing loss functions better suited to training similarity in neural networks. However, the impact of these advances is tested on larger datasets, with limited attention given to smaller imbalanced datasets such as those found in unique wildlife re-identification. To this end, we test the advances in loss functions for similarity learning on several animal re-identification tasks. We add two new public datasets, Nyala and Lions, to the challenge of animal re-identification. Our results are state of the art on all public datasets tested except Pandas. The achieved Top-1 Recall is 94.8% on the Zebra dataset, 72.3% on the Nyala dataset, 79.7% on the Chimps dataset and, on the Tiger dataset, it is 88.9%. For the Lion dataset, we set a new benchmark at 94.8%. We find that the best performing loss function across all datasets is generally the triplet loss; however, there is only a marginal improvement compared to the performance achieved by Proxy-NCA models. We demonstrate that no single neural network architecture combined with a loss function is best suited for all datasets, although VGG-11 may be the most robust first choice. Our results highlight the need for broader experimentation and exploration of loss functions and neural network architecture for the more challenging task, over classical benchmarks, of wildlife re-identification.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257404
Author(s):  
Adam J. Kleinschmit ◽  
Elizabeth F. Ryder ◽  
Jacob L. Kerby ◽  
Barbara Murdoch ◽  
Sam Donovan ◽  
...  

As powerful computational tools and ‘big data’ transform the biological sciences, bioinformatics training is becoming necessary to prepare the next generation of life scientists. Furthermore, because the tools and resources employed in bioinformatics are constantly evolving, bioinformatics learning materials must be continuously improved. In addition, these learning materials need to move beyond today’s typical step-by-step guides to promote deeper conceptual understanding by students. One of the goals of the Network for Integrating Bioinformatics into Life Sciences Education (NIBSLE) is to create, curate, disseminate, and assess appropriate open-access bioinformatics learning resources. Here we describe the evolution, integration, and assessment of a learning resource that explores essential concepts of biological sequence similarity. Pre/post student assessment data from diverse life science courses show significant learning gains. These results indicate that the learning resource is a beneficial educational product for the integration of bioinformatics across curricula.


Sign in / Sign up

Export Citation Format

Share Document